How to unpack nested JSON into Python Dataclass - json

Dataclass example:
#dataclass
class StatusElement:
status: str
orderindex: int
color: str
type: str
#dataclass
class List:
id: int
statuses: List[StatusElement]
JSON example:
json = {
"id": "124",
"statuses": [
{
"status": "to do",
"orderindex": 0,
"color": "#d3d3d3",
"type": "open"
}]
}
I can unpack the JSON doing something like this:
object = List(**json)
But I'm not sure how can I also unpack the statuses into a status object and appened to the statuses list of the List object? I'm sure I need to loop over it somehow but not sure how to combine that with unpacking.

Python dataclasses is a great module, but one of the things it doesn't unfortunately handle is parsing a JSON object to a nested dataclass structure.
A few workarounds exist for this:
You can either roll your own JSON parsing helper method, for example a from_json which converts a JSON string to an List instance with a nested dataclass.
You can make use of existing JSON serialization libraries. For example, pydantic is a popular one that supports this use case.
Here is an example using the dataclass-wizard library that works well enough for your use case. It's more lightweight than pydantic and coincidentally also a little faster. It also supports automatic case transforms and type conversions (for example str to annotated int)
Example below:
from dataclasses import dataclass
from typing import List as PyList
from dataclass_wizard import JSONWizard
#dataclass
class List(JSONWizard):
id: int
statuses: PyList['StatusElement']
# on Python 3.9+ you can use the following syntax:
# statuses: list['StatusElement']
#dataclass
class StatusElement:
status: str
order_index: int
color: str
type: str
json = {
"id": "124",
"statuses": [
{
"status": "to do",
"orderIndex": 0,
"color": "#d3d3d3",
"type": "open"
}]
}
object = List.from_dict(json)
print(repr(object))
# List(id=124, statuses=[StatusElement(status='to do', order_index=0, color='#d3d3d3', type='open')])
Disclaimer: I am the creator (and maintainer) of this library.
You can now skip the class inheritance as of the latest release of dataclass-wizard. It's straightforward enough to use it; using the same example from above, but I've removed the JSONWizard usage from it completely. Just remember to ensure you don't import asdict from the dataclasses module, even though I guess that should coincidentally work.
Here's the modified version of the above without class inheritance:
from dataclasses import dataclass
from typing import List as PyList
from dataclass_wizard import fromdict, asdict
#dataclass
class List:
id: int
statuses: PyList['StatusElement']
#dataclass
class StatusElement:
status: str
order_index: int
color: str
type: str
json = {
"id": "124",
"statuses": [
{
"status": "to do",
"orderIndex": 0,
"color": "#d3d3d3",
"type": "open"
}]
}
# De-serialize the JSON dictionary into a `List` instance.
c = fromdict(List, json)
print(c)
# List(id=124, statuses=[StatusElement(status='to do', order_index=0, color='#d3d3d3', type='open')])
# Convert the instance back to a dictionary object that is JSON-serializable.
d = asdict(c)
print(d)
# {'id': 124, 'statuses': [{'status': 'to do', 'orderIndex': 0, 'color': '#d3d3d3', 'type': 'open'}]}
Also, here's a quick performance comparison with dacite. I wasn't aware of this library before, but it's also very easy to use (and there's also no need to inherit from any class). However, from my personal tests - Windows 10 Alienware PC using Python 3.9.1 - dataclass-wizard seemed to perform much better overall on the de-serialization process.
from dataclasses import dataclass
from timeit import timeit
from typing import List
from dacite import from_dict
from dataclass_wizard import JSONWizard, fromdict
data = {
"id": 124,
"statuses": [
{
"status": "to do",
"orderindex": 0,
"color": "#d3d3d3",
"type": "open"
}]
}
#dataclass
class StatusElement:
status: str
orderindex: int
color: str
type: str
#dataclass
class List:
id: int
statuses: List[StatusElement]
class ListWiz(List, JSONWizard):
...
n = 100_000
# 0.37
print('dataclass-wizard: ', timeit('ListWiz.from_dict(data)', number=n, globals=globals()))
# 0.36
print('dataclass-wizard (fromdict): ', timeit('fromdict(List, data)', number=n, globals=globals()))
# 11.2
print('dacite: ', timeit('from_dict(List, data)', number=n, globals=globals()))
lst_wiz1 = ListWiz.from_dict(data)
lst_wiz2 = from_dict(List, data)
lst = from_dict(List, data)
# True
assert lst.__dict__ == lst_wiz1.__dict__ == lst_wiz2.__dict__

A "cleaner" solution (in my eyes). Use dacite
No need to inherit anything.
from dataclasses import dataclass
from typing import List
from dacite import from_dict
data = {
"id": 124,
"statuses": [
{
"status": "to do",
"orderindex": 0,
"color": "#d3d3d3",
"type": "open"
}]
}
#dataclass
class StatusElement:
status: str
orderindex: int
color: str
type: str
#dataclass
class List:
id: int
statuses: List[StatusElement]
lst: List = from_dict(List, data)
print(lst)
output
List(id=124, statuses=[StatusElement(status='to do', orderindex=0, color='#d3d3d3', type='open')])

I've spent a few hours investigating options for this. There's no native Python functionality to do this, but there are a few third-party packages (writing in November 2022):
marshmallow_dataclass has this functionality (you need not be using marshmallow in any other capacity in your project). It gives good error messages and the package is actively maintained. I used this for a while before hitting what I believe is a bug parsing a large and complex JSON into deeply nested dataclasses, and then had to switch away.
dataclass-wizard is easy to use and specifically addresses this use case. It has excellent documentation. One significant disadvantage is that it won't automatically attempt to find the right fit for a given JSON, if trying to match against a union of dataclasses (see https://dataclass-wizard.readthedocs.io/en/latest/common_use_cases/dataclasses_in_union_types.html). Instead it asks you to add a "tag key" to the input JSON, which is a robust solution but may not be possible if you have no control over the input JSON.
dataclass-json is similar to dataclass-wizard, and again doesn't attempt to match the correct dataclass within a union.
dacite is the option I have settled upon for the time being. It has similar functionality to marshmallow_dataclass, at least for JSON parsing. The error messages are significantly less clear than marshmallow_dataclass, but slightly offsetting this, it's easier to figure out what's wrong if you pdb in at the point that the error occurs - the internals are quite clear and you can experiment to see what's going wrong. According to others it is rather slow, but that's not a problem in my circumstance.

Related

It is really tedious to write a bunch of data classes for parsing a simple JSON using Kotlin's seriallization library. Any better way?

I tried parsing JSON using Kotlin's default serialization library. However, I found it really overwhelming to write a bunch of data classes to deserialize a simple JSON string.
To illustrate,
{
"artists": {
"items": [
{
"genres": [
"desi pop",
"filmi",
"modern bollywood"
],
"images": [
{
"url": "https://i.scdn.co/image/ab6761610000e5ebb2b70762d89a9d76c772b3b6"
}
],
"name": "Arijit Singh",
"type": "artist"
}
]
}
}
for this data, I had to write these many classes,
#Serializable
data class Root(val artists: SubRoot)
#Serializable
data class SubRoot(val items: List<Artist>)
#Serializable
data class Artist(
val genres: List<String>,
val images: List<Image>,
val name: String,
val type: String
)
#Serializable
data class Image(val url: String)
Does anybody know a better way? Some library with in-built magic that does these kind of stuff for me?
If you don't want to use the automatic mapping you can just parse them as JsonElements and do your own thing instead of letting the library map them to those data classes.
https://github.com/Kotlin/kotlinx.serialization/blob/master/docs/json.md#json-elements
For example, if you want to get that url, you could do:
val root = Json.parseToJsonElement(json)
return root.
jsonObject["artists"]?.
jsonObject?.get("items")?.
jsonArray?.get(0)?.
jsonObject?.get("images")?.
jsonArray?.get(0)?.
jsonObject?.get("url")?.
jsonPrimitive.toString()
)
This specific example will return null if any field couldn't be found while traversing the tree. It will give an IllegalArgumentException if any of the casts fail.

Kotlin - Array property in data class error

I'm modelling some JSON - and using the following lines
data class Metadata(
val id: String,
val creators: Array<CreatorsModel>
)
along with:
data class CreatorsModel (
val role: String,
val name: String
)
However keep seeing the error: Array property in data class error.
Any ideas why this is?
FYI, the JSON looks like:
{
"id": "123",
"creators": [{
"role": "Author",
"name": "Marie"
}
]
}
In Kotlin you should aim to use List instead of Array where possible. Array has some JVM implications, and although the compiler will let you, the IDE may prompt you to override equals and hashcode manually. Using List will make things much simpler.
You can find out more about the difference here: Difference between List and Array types in Kotlin

Parse JSON array using Scala Argonaut

I'm using Scala & Argonaut, trying to parse the following JSON:
[
{
"name": "apple",
"type": "fruit",
"size": 3
},
{
"name": "jam",
"type": "condiment",
"size": 5
},
{
"name": "beef",
"type": "meat",
"size": 1
}
]
And struggling to work out how to iterate and extract the values into a List[MyType] where MyType will have name, type and size properties.
I will post more specific code soon (i have tried many things), but basically I'm looking to understand how the cursor works, and how to iterate through arrays etc. I have tried using \\ (downArray) to move to the head of the array, then :->- to iterate through the array, then --\ (downField) is not available (at least IntelliJ doesn't think so).
So the question is how do i:
navigate to the array
iterate through the array (and know when I'm done)
extract string, integer etc. values for each field - jdecode[String]? as[String]?
The easiest way to do this is to define a codec for MyType. The compiler will then happily construct a decoder for List[MyType], etc. I'll use a plain class here (not a case class) to make it clear what's happening:
class MyType(val name: String, val tpe: String, val size: Int)
import argonaut._, Argonaut._
implicit def MyTypeCodec: CodecJson[MyType] = codec3(
(name: String, tpe: String, size: Int) => new MyType(name, tpe, size),
(myType: MyType) => (myType.name, myType.tpe, myType.size)
)("name", "type", "size")
codec3 takes two parameter lists. The first has two parameters, which allow you to tell how to create an instance of MyType from a Tuple3 and vice versa. The second parameter list lets you specify the names of the fields.
Now you can just write something like the following (if json is your string):
Parse.decodeValidation[List[MyType]](json)
And you're done.
Since you don't need to encode and are only looking at decoding, you can do as suggested by Travis, but by implementing another implicit: MyTypeDecodeJson
implicit def MyTypeDecodeJson: DecodeJson[MyType] = DecodeJson(
raw => for {
name <- raw.get[String]("name")
type <- raw.get[String]("type")
size <- raw.get[Int]("size")
} yield MyType(name, type, size))
Then to parse your list:
Parse.decodeValidation[List[MyType]](jsonString)
Assuming MyType is a case class, the following works too:
case class MyType(name: String, type: String, size: Int)
object MyType {
implicit val createCodecJson: CodecJson[MyType] = CodecJson.casecodec3(apply, unapply)(
"name",
"type",
"size"
)
}

Scala/Play: JSON serialization issue

I have a simple custom data structure which I use to map the results from the database:
case class Filter(id: Int, table: String, name: String, Type: String, structure: String)
The resulting object type is List[Filter] and if converted to JSON, it should look something like this:
[
{
"id": 1,
"table": "table1",
"name": "name1",
"Type": "type1",
"structure": "structure1"
},
{
"id": 2,
"table": "table2",
"name": "name2",
"Type": "type2",
"structure": "structure2"
}
]
Now when I try to serialize my object into JSON
val result: String = Json.toJson(filters)
I am getting something like
No Json deserializer found for type List[Filter]. Try to implement an implicit Writes or Format for this type.
How do I solve this seemingly simple problem without writing some ridiculous amount of boilerplate?
My stack is Play 2.2.1, Scala 2.10.3, Java 8 64bit
Short answer:
Just add:
implicit val filterWrites = Json.writes[Filter]
Longer answer:
If you look at the definition of Json.toJson, you will see that its complete signature is:
def toJson[T](o: T)(implicit tjs: Writes[T]): JsValue = tjs.writes(o)
Writes[T] knows how to take a T and transform it to a JsValue. You will need to have an implicit Writes[Filter] around that knows how to serialize your Filter instance. The good news is that Play's JSON library comes with a macro that can instantiate those Writes[_] for you, so you don't have to write boring code that transforms your case class's fields into JSON values. To invoke this macro and have its value picked up by implicit search add the line above to your scope.

Parsing bad Json in Scala

I'm trying to parse some problematic Json in Scala using Play Json and using implicit, but not sure how to proceed...
The Json looks like this:
"rules": {
"Some_random_text": {
"item_1": "Some_random_text",
"item_2": "text",
"item_n": "MoreText",
"disabled": false,
"Other_Item": "thing",
"score": 1
},
"Some_other_text": {
"item_1": "Some_random_text",
"item_2": "text",
"item_n": "MoreText",
"disabled": false,
"Other_Item": "thing",
"score": 1
},
"Some_more_text": {
"item_1": "Some_random_text",
"item_2": "text",
"item_n": "MoreText",
"disabled": false,
"Other_Item": "thing",
"score": 1
}
}
I'm using an implicit reader but because each top level item in rules is effectively a different thing I don't know how to address that...
I'm trying to build a case class and I don't actually need the random text heading for each item but I do need each item.
To make my life even harder after these items are lots of things in other formats which I really don't need. They are unnamed items which just start:
{
random legal Json...
},
{
more Json...
}
I need to end up with the Json I'm parsing in a seq of case classes.
Thanks for your thoughts.
I'm using an implicit reader but because each top level item in rules is effectively a different thing I don't know how to address that...
Play JSON readers depend on knowing names of fields in advance. That goes for manually constructed readers and also for macro generated readers. You cannot use an implicit reader in this case. You need to do some traversing first and extract pieces of Json that do have regular structure with known names and types of fields. E.g. like this:
case class Item(item_1: String, item_2: String, item_n: String, disabled: Boolean, Other_Item: String, score: Int)
implicit val itemReader: Reads[Item] = Json.reads[Item]
def main(args: Array[String]): Unit = {
// parse JSON text and assume, that there is a JSON object under the "rules" field
val rules: JsObject = Json.parse(jsonText).asInstanceOf[JsObject]("rules").asInstanceOf[JsObject]
// traverse all fields, filter according to field name, collect values
val itemResults = rules.fields.collect {
case (heading, jsValue) if heading.startsWith("Some_") => Json.fromJson[Item](jsValue) // use implicit reader here
}
// silently ignore read errors and just collect sucessfully read items
val items = itemResults.flatMap(_.asOpt)
items.foreach(println)
}
Prints:
Item(Some_random_text,text,MoreText,false,thing,1)
Item(Some_random_text,text,MoreText,false,thing,1)
Item(Some_random_text,text,MoreText,false,thing,1)