Unable to serialize a nested python object using json.dumps() - json

I am new to python so sorry about the naive questions. I have a simple code snipper where I try to serialize a python object to a dictionary using json.dumps()
import json
class Document:
uid = "1"
content = "content1"
domain = "domain"
title = "title"
class ASSMSchema:
requestSource = "unittest"
documents = []
def entry():
myObj = ASSMSchema()
myObj.requestSource = "unittest"
document1 = Document()
document1.uid = "1"
document1.content = "content1"
document1.domain = "domain"
document1.title = "title"
myObj.documents.append(document1)
print(json.dumps(myObj.__dict__))
if __name__ == "__main__":
entry()
I get the following output when I run the above code
{"requestSource": "unittest"}
This is not expected however, since it should also seralize the List of "Document" objects. Appreciate your answers. Thanks in advance!

Your class definition of ASSMSchema defines the class members documents and requestSource. These are not attributes of a single instance of this class, but shared between all instances. When you are running myObj.requestSource = "unittest", you are defining a member variable on the instance myObj. This member is actually reflected in the output of json.dumps, whereas the class members (like documents) are not.
For further reading, see https://docs.python.org/3/tutorial/classes.html#class-and-instance-variables
Depending on the complexity and desired maintainability of your program, there are multiple approaches to archieve your desired behaviour. Firstly, you have to fix the mistake in both class definitions. To define a class with instance variables instead of class variables, do something like this:
class Foo:
# class variables go here
def __init__(self, field1, field2):
# This method is called when you write Foo(field1, field2)
# these are instance variables
self.field1 = field1
self.field2 = field2
If you want to dump this class as JSON, you can simply use the trick with __dict__: print(json.dumps(Foo(1,2).__dict__)) will output something like { "field1": 1, "field2": 2 }.
In your case, there is the documents member though, which is not JSON serializable by default. Therefore, you must handle this separately as well. You could write an encoder for your ASSMSchema (see this thread for more info on that). It could be implemented roughly like this:
from json import JSONEncoder
class ASSMSchemaEncoder(JSONEncoder):
def default(self, o):
return {
"requestSource": o.requestSource,
# Convert the list of Document objects to a list of dict
"documents": [d.__dict__ for d in o.documents]
}
Now, when serializing an instance of ASSMSchema, this implemention is used and the documents member is replaced with a list of dictionaires (which can be serialized by the default encoder). Note, that you have to specify this encoder when calling json.dumps, see the linked thread above.

Related

Deserializing JSON into Serializable class with generic field - error: Star projections in type arguments are not allowed

Intro
I'm sending JSON messages between two backend servers that use different languages. The producing
server creates a variety of JSON messages, wrapped inside a message with metadata.
The wrapping class is Message, The consuming server has to determine which type of message its
receiving based solely on the message contents.
When I try to use a star-projection to
deserialize the message, I get an error.
Example
import kotlinx.serialization.json.Json
#Language("JSON")
val carJson = """
{
"message_type": "some message",
"data": {
"info_type": "Car",
"name": "Toyota"
}
}
""".trimIndent()
// normally I wouldn't know what the Json message would be - so the type is Message<*>
val actualCarMessage = Json.decodeFromString<Message<*>>(carJson)
Error message
Exception in thread "main" java.lang.IllegalArgumentException: Star projections in type arguments are not allowed, but Message<*>
at kotlinx.serialization.SerializersKt__SerializersKt.serializerByKTypeImpl$SerializersKt__SerializersKt(Serializers.kt:81)
at kotlinx.serialization.SerializersKt__SerializersKt.serializer(Serializers.kt:59)
at kotlinx.serialization.SerializersKt.serializer(Unknown Source)
at ExampleKt.main(example.kt:96)
at ExampleKt.main(example.kt)
Class structure
I want to deserialize JSON into a data class, Message, that has a field with a generic type.
import kotlinx.serialization.SerialName
import kotlinx.serialization.Serializable
#Serializable
data class Message<out DataType : SpecificInformation>(
#SerialName("message_type")
val type: String,
#SerialName("data")
val data: DataType,
)
The field is constrained by a sealed interface, SpecificInformation, with some implementations.
import kotlinx.serialization.SerialName
import kotlinx.serialization.Serializable
import kotlinx.serialization.json.JsonClassDiscriminator
#JsonClassDiscriminator("info_type")
sealed interface SpecificInformation {
#SerialName("info_type")
val infoType: String
}
#Serializable
#SerialName("User")
data class UserInformation(
#SerialName("info_type")
override val infoType: String,
val name: String,
) : SpecificInformation
// there are more implementations...
Workaround?
This is a known
issue (kotlinx.serialization/issues/944)
,
so I'm looking for workarounds.
I have control over the JSON structure and libraries - though I have a preference for
kotlinx.serialization.
I can't change that there are two JSON objects, one is inside the other, and the discriminator is
inside the inner-class.
A custom serializer would be great. But I'd prefer to have this configured on the class or file
(with #Serializable(with = ...) or #file:UseSerializers(...)) as using a
custom SerializersModule is not as seamless.
Attempt: JsonContentPolymorphicSerializer
I've written a custom serializer, which only if it's used specifically (which is something I'd like
to avoid). It's also quite clunky, breaks if the data classes change or a new one is added, and
doesn't benefit from the sealed interface.
Can this be improved so that
It can be used generically? Json.decodeFromString<Message<*>>(carJson)
It doesn't have any hard-coded strings?
class MessageCustomSerializer : JsonContentPolymorphicSerializer<Message<*>>(Message::class) {
override fun selectDeserializer(element: JsonElement): DeserializationStrategy<out Message<*>> {
val discriminator = element
.jsonObject["data"]
?.jsonObject?.get("info_type")
?.jsonPrimitive?.contentOrNull
println("found discriminator $discriminator")
val subclassSerializer = when (discriminator?.lowercase()) {
"user" -> UserInformation.serializer()
"car" -> CarInformation.serializer()
else -> throw IllegalStateException("could not find serializer for $discriminator")
}
println("found subclassSerializer $subclassSerializer")
return Message.serializer(subclassSerializer)
}
}
fun main() {
#Language("JSON")
val carJson = """
{
"message_type": "another message",
"data": {
"info_type": "Car",
"brand": "Toyota"
}
}
""".trimIndent()
val actualCarMessage =
Json.decodeFromString(MessageCustomSerializer(), carJson)
val expectedCarMessage = Message("another message", CarInformation("Car", "Toyota"))
require(actualCarMessage == expectedCarMessage) {
println("car json parsing ❌")
}
println("car json parsing ✅")
}
#Serializable(with = ... - infinite loop
I tried applying MessageCustomSerializer directly to Message...
#Serializable(with = MessageCustomSerializer::class)
data class Message<out T : SpecificInformation>(
//...
But then I couldn't access the plugin-generated serializer, and this causes an infinite loop.
return Message.serializer(subclassSerializer) // calls 'MessageCustomSerializer', causes infinite loop
#Serializer(forClass = ...) - not generic
In addition to annotating Message with #Serializable(with = MessageCustomSerializer::class), I
tried
deriving a plugin-generated serializer:
#Serializer(forClass = Message::class)
object MessagePluginGeneratedSerializer : KSerializer<Message<*>>
But this serializer is not generic, and causes an error
java.lang.AssertionError: No such value argument slot in IrConstructorCallImpl: 0 (total=0).
Symbol: MessageCustomSerializer.<init>|-5645683436151566731[0]
at org.jetbrains.kotlin.ir.expressions.IrMemberAccessExpressionKt.throwNoSuchArgumentSlotException(IrMemberAccessExpression.kt:66)
at org.jetbrains.kotlin.ir.expressions.IrFunctionAccessExpression.putValueArgument(IrFunctionAccessExpression.kt:31)
at org.jetbrains.kotlinx.serialization.compiler.backend.ir.IrBuilderExtension$DefaultImpls.irInvoke(GeneratorHelpers.kt:210)
at org.jetbrains.kotlinx.serialization.compiler.backend.ir.SerializableCompanionIrGenerator.irInvoke(SerializableCompanionIrGenerator.kt:35)
You are asking many things here, so I will simply try to give some pointers in regards to the errors you are making which you seem to be stuck on. With those in mind, and reading the documentation I link to, I believe you should be able to resolve the rest yourself.
Polymorphic serialization
Acquaint yourself with kotlinx.serialization polymorphic serialization. When you are trying to serialize Message<*> and DataType you are trying to use polymorphic serialization.
In case you are serializing Message<*> as the root object, specifying PolymorphicSerializer explicitly (as I also posted in the bug report you link to) should work. E.g., Json.decodeFromString( PolymorphicSerializer( Message::class ), carJson ).
P.s. I'm not 100% certain what you are trying to do here is the same as in the bug report. Either way, specifying the serializer explicitely should work, whether or not it is a bug that you shouldn't be required to do so.
The message_type and info_type fields you have in Message and DataType respectively are class discriminators. You need to configure this in your Json settings, and set the correct SerialName on your concrete classes for them to work. Using a different class discriminator per hierarchy is only possible starting from kotlinx.serialization 1.3.0 using #JsonClassDiscriminator.
Overriding plugin-generated serializer
But then I couldn't access the plugin-generated serializer, and this causes an infinite loop.
#Serializable(with = ...) overrides the plugin-generated serializer. If you want to retain the plugin-generated serializer, do not apply with.
When you are serializing the object directly (as the root object), you can still pass a different serializer to use as the first parameter to encode/decode. When you want to override the serializer to use for a specific property nested somewhere in the root object, use #Serializable on the property.
Polymorphism and generic classes
The "No such value argument slot in IrConstructorCallImpl: 0" error is to be expected.
You need to do more work in case you want to specify a serializer for polymorphic generic classes.

Python objects in dealloc in cython

In the docs it is written, that "Any C data that you explicitly allocated (e.g. via malloc) in your __cinit__() method should be freed in your __dealloc__() method."
This is not my case. I have following extension class:
cdef class SomeClass:
cdef dict data
cdef void * u_data
def __init__(self, data_len):
self.data = {'columns': []}
if data_len > 0:
self.data.update({'data': deque(maxlen=data_len)})
else:
self.data.update({'data': []})
self.u_data = <void *>self.data
#property
def data(self):
return self.data
#data.setter
def data(self, new_val: dict):
self.data = new_val
Some c function has an access to this class and it appends some data to SomeClass().data dict. What should I write in __dealloc__, when I want to delete the instance of the SomeClass()?
Maybe something like:
def __dealloc__(self):
self.data = None
free(self.u_data)
Or there is no need to dealloc anything at all?
No you don't need to and no you shouldn't. From the documentation
You need to be careful what you do in a __dealloc__() method. By the time your __dealloc__() method is called, the object may already have been partially destroyed and may not be in a valid state as far as Python is concerned, so you should avoid invoking any Python operations which might touch the object. In particular, don’t call any other methods of the object or do anything which might cause the object to be resurrected. It’s best if you stick to just deallocating C data.
You don’t need to worry about deallocating Python attributes of your object, because that will be done for you by Cython after your __dealloc__() method returns.
You can confirm this by inspecting the C code (you need to look at the full code, not just the annotated HTML). There's an autogenerated function __pyx_tp_dealloc_9someclass_SomeClass (name may vary slightly depending on what you called your module) does a range of things including:
__pyx_pw_9someclass_9SomeClass_3__dealloc__(o);
/* some other code */
Py_CLEAR(p->data);
where the function __pyx_pw_9someclass_9SomeClass_3__dealloc__ is (a wrapper for) your user-defined __dealloc__. Py_CLEAR will ensure that data is appropriately reference-counted then set to NULL.
It's a little hard to follow because it all goes through several layers of wrappers, but you can confirm that it does what the documentation says.

Scala - Circe - Case Class Serialization without Class Name

I have used Circe previously for case class serialization / deserialization, and love how it can be used without the boilerplate code required by other Scala JSON libraries, but I'm running into an issue now I'm not sure how to resolve. I have an ADT (a sealed trait with several case class instances) that I would like to treat (from my Akka Http Service, using akka-http-json) generically (ie, return a List[Foo], where Foo is the trait-type), but when I do so using Circe's auto-deriviation (via Shapeless), it serializes the instances using the specific case class name as a 'discriminator' (eg, if my List[Foo] contains instances of Foo1, then each element in the resulting serialized list will have the key Foo1). I would like to eliminate the type name as a discriminator (ie, so that instead of having each element in the sequence prefixed with the type name-- eg, "Foo1": {"id : "1", name : "First",...}, I just want to serialize the case class instances to contain the fields of the case class: eg, {"id":"1,"name:"First",...}...Essentially, I'd like to eliminate the type name keys (I don't want the front-end to have to know what concrete case class each element belongs to on the back-end).All elements in the list to be serialized will be of the same concrete-type, all of which would be subtypes of my ADT (trait) type. I believe this can be done using Circe's semi-auto derivation, though I haven't had a chance to figure out exactly how. Basically, I would like to use as much of Circe's auto-derivation as possible, but eliminate outer-level class names from appearing in the resulting JSON. Any help / suggestions would be very much appreciated! Thanks!
you can do it following the instruction in the doc: https://circe.github.io/circe/codecs/adt.html
import cats.syntax.functor._
import io.circe.{ Decoder, Encoder }, io.circe.generic.auto._
import io.circe.syntax._
object GenericDerivation {
implicit val encodeEvent: Encoder[Event] = Encoder.instance {
case foo # Foo(_) => foo.asJson
case bar # Bar(_) => bar.asJson
case baz # Baz(_) => baz.asJson
case qux # Qux(_) => qux.asJson
}
implicit val decodeEvent: Decoder[Event] =
List[Decoder[Event]](
Decoder[Foo].widen,
Decoder[Bar].widen,
Decoder[Baz].widen,
Decoder[Qux].widen
).reduceLeft(_ or _)
}
import GenericDerivation._
import io.circe.parser.decode
decode[Event]("""{ "i": 1000 }""")
// res0: Either[io.circe.Error,Event] = Right(Foo(1000))
(Foo(100): Event).asJson.noSpaces
// res1: String = {"i":100}
This may not be the best answer, but after some more searching this is what I've been able to find. Instead of having the class name as a key in the Json produced, it can be serialized as a field as following:
implicit val genDevConfig: Configuration = Configuration.default.withDescriminator("type")
(you can use whatever field name here you'd like; Travis Brown's previous example for a similar issue used a field named what_am_i). So my apologies-- I do not yet know if there is a canonical or widely accepted solution to this problem, especially one that will easily work with Akka Http, using libraries such as akka-http-json, where I still seem to be encountering some issues, though I'm sure I'm probably overlooking something obvious! Anyway, my apologies for asking a question that seems to come up repeatedly!

JSON Encoding custom class not calling overriden default

I have a class definition which I have based on json.JSONEncoder, within which I have overriden the default method. Now when I call json.dumps on an instance of that class the default method is not being called? Is there something I have missed?
In my example code I do not expect this to magically produce the serialized object but I would expect the print("here") to be executed.
import json
class MyClass(json.JSONEncoder):
id = "myId"
data = "myData"
def default(self, o):
print("here")
print ("Create instance")
obj = MyClass()
print("Serialize")
print(json.dumps(obj))
print ("and done")
I am quite new to Python, so apologies if this is something horribly obvious.
After some further digging and tracing I think I have found the cause. Part of the issue I think it my own misunderstanding of how this is intended to be used.
When calling json.dumps if you wish to use a custom encoder you need to specify the class of that encoder, otherwise dumps defaults to using the standard implementation of JSONEncoder.
json.dumps(obj, cls=MyEncoder)
My misconception was that by basing my class on json.JSONEncoder that dumps would simply recognise the instance as inheriting from JSONEncoder and call the override default method. However this is not the case.
I have now created the logic in it's own class to encode my own class/types and when I call json.dumps I pass in that class name.
So I now have
class MyEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, MyClass):
return {"id": obj.id, "data": obj.data}
return json.JSONEncoder.default(self, obj)
And when I wish to serialize I use
json.dumps(object_to_serialize, cls=MyEncoder)
Which recognises my class and handles it, or passes on the encoding to the default encoder.

No instance of play.api.libs.json.Format is available for models.AccountStatus in the implicit scope

No instance of play.api.libs.json.Format is available for models.AccountStatus in the implicit scope.
This is the code taken from a github page, and only class names and variable names are changed.
package models
import slick.jdbc.H2Profile._
import play.api.libs.json._
case class Account(id: Long, name: String, category: Int, status:AccountStatus)
object Account {
implicit val accountFormat = Json.format[Account]
}
sealed abstract class AccountStatus(val as:Int)
object AccountStatus{
final case object Draft extends AccountStatus(0)
final case object Active extends AccountStatus(1)
final case object Blocked extends AccountStatus(2)
final case object Defaulter extends AccountStatus(3)
implicit val columnType: BaseColumnType[AccountStatus] = MappedColumnType.base[AccountStatus,Int](AccountStatus.toInt, AccountStatus.fromInt)
private def toInt(as:AccountStatus):Int = as match {
case Draft => 0
case Active => 1
case Blocked => 2
case Defaulter => 3
}
private def fromInt(as: Int): AccountStatus = as match {
case 0 => Draft
case 1 => Active
case 2 => Blocked
case 3 => Defaulter
_ => sys.error("Out of bound AccountStatus Value.")
}
}
https://github.com/playframework/play-scala-slick-example/blob/2.6.x/app/models/Person.scala
So, this code needs to be added inside of the object AccountStatus code block since we need to use fromInt to transform an Int to an AccountStatus. This is a Reads defined for AccountStatus:
implicit object AccountStatusReads extends Reads[AccountStatus] {
def reads(jsValue: JsValue): JsResult[AccountStatus] = {
(jsValue \ "as").validate[Int].map(fromInt)
}
}
What's a Reads? It's just a trait that defines how a JsValue (the play class encapsulating JSON values) should be deserialized from JSON to some type. The trait only requires one method to be implemented, a reads method which takes in some json and returns a JsResult of some type. So you can see in the above code that we have a Reads that will look for a field in JSON called as and try to read it as an integer. From there, it will then transform it into an AccountStatus using the already defined fromInt method. So for example in the scala console you could do this:
import play.api.libs.json._
// import wherever account status is and the above reader
scala> Json.parse("""{"as":1}""").as[AccountStatus]
res0: AccountStatus = Active
This reader isn't perfect though, mainly because it's not handling the error your code will give you on out of bound numbers:
scala> Json.parse("""{"as":20}""").as[AccountStatus]
java.lang.RuntimeException: Out of bound AccountStatus Value.
at scala.sys.package$.error(package.scala:27)
at AccountStatus$.fromInt(<console>:42)
at AccountStatusReads$$anonfun$reads$1.apply(<console>:27)
at AccountStatusReads$$anonfun$reads$1.apply(<console>:27)
at play.api.libs.json.JsResult$class.map(JsResult.scala:81)
at play.api.libs.json.JsSuccess.map(JsResult.scala:9)
at AccountStatusReads$.reads(<console>:27)
at play.api.libs.json.JsValue$class.as(JsValue.scala:65)
at play.api.libs.json.JsObject.as(JsValue.scala:166)
... 42 elided
You could handle this by making the Reads handle the error. I can show you how if you want, but first the other part of a Format is a Writes. This trait, unsurprisingly is similar to reads except it does the reverse. You're taking your class AccountStatus and creating a JsValue (JSON). So, you just have to implement the writes method.
implicit object AccountStatusWrites extends Writes[AccountStatus] {
def writes(as: AccountStatus): JsValue = {
JsObject(Seq("as" -> JsNumber(as.as)))
}
}
Then this can be used to serialize that class to JSON like so:
scala> Json.toJson(Draft)
res4: play.api.libs.json.JsValue = {"as":0}
Now, this is actually enough to get your error to go away. Why? Because Json.format[Account] is doing all the work we just did for you! But for Account. It can do this because it's a case class and has less than 22 fields. Also every field for Account has a way to be converted to and from JSON (via a Reads and Writes). Your error message was showing that Account could not have a format automatically created for it because part of it (status field) had no formatter.
Now, why do you have to do this? Because AccountStatus is not a case class, so you can't call Json.format[AccountStatus] on it. And because the subclasses of it are each objects, which have no unapply method defined for them since they're not case classes. So you have to explain to the library how to serialize and deserialize.
Since you said you're new to scala, I imagine that the concept of an implicit is still somewhat foreign. I recommend you play around with it / do some reading to get a grasp of what to do when you see that the compiler is complaining about not being able to find an implicit it needs.
Bonus round
So, you might really not want to do that work yourself, and there is a way to avoid having to do it so you can do Json.format[AccountStatus]. You see Json.format uses the apply and unapply methods to do its dirty work. In scala, these two methods are defined automatically for case classes. But there's no reason you can't define them yourself and get everything they give you for free!
So, what do apply and unapply look like type signature wise? It changes per class, but in this case apply should match Int => AccountStatus (a function that goes from an int to an AccountStatus). So it's defined like so:
def apply(i: Int): AccountStatus = fromInt(i)
and unapply is similar to the reverse of this, but it needs to return an Option[Int], so it looks like
def unapply(as: AccountStatus): Option[Int] = Option(as.as)
with both of these defined you don't need to define the reads and writes yourself and instead can just call
// this is still inside the AccountStatus object { ... }
implicit val asFormat = Json.format[AccountStatus]
and it will work in a similar fashion.
.P.S. I'm traveling today, but feel free to leave any comments if some of this doesn't make sense and I'll try to get back to you later on