JSON Serialization is empty when serialising from eulxml in python - json

I am working with eXistDB in python and leveraging the eulxml library to handle mapping from the xml in the database into custom objects. I want to then serialize these objects to json (for another application to consume) but I'm running into issues. jsonpickle doesn't work (it ends up returning all sorts of excess garbage and the value are the fields aren't actually encoded but rather their eulxml type) and the standard json.dumps() is simply giving me empty json (this was after trying to implement the solution detailed here). The problem seems to stem from the fact that the __dict__ values are not initialised __oninit__ (as they are mapped as class properties) so the __dict__ appears empty upon serialization. Here is some sample code:
Serializable Class Object
class Serializable(dict):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# hack to fix _json.so make_encoder serialize properly
self.__setitem__('dummy', 1)
def _myattrs(self):
return [
(x, self._repr(getattr(self, x)))
for x in self.__dir__()
if x not in Serializable().__dir__()
]
def _repr(self, value):
if isinstance(value, (str, int, float, list, tuple, dict)):
return value
else:
return repr(value)
def __repr__(self):
return '<%s.%s object at %s>' % (
self.__class__.__module__,
self.__class__.__name__,
hex(id(self))
)
def keys(self):
return iter([x[0] for x in self._myattrs()])
def values(self):
return iter([x[1] for x in self._myattrs()])
def items(self):
return iter(self._myattrs())
Base Class
from eulxml import xmlmap
import inspect
import lxml
import json as JSON
from models.serializable import Serializable
class AlcalaBase(xmlmap.XmlObject,Serializable):
def toJSON(self):
return JSON.dumps(self, indent=4)
def to_json(self, skipBegin=False):
json = list()
if not skipBegin:
json.append('{')
json.append(str.format('"{0}": {{', self.ROOT_NAME))
for attr, value in inspect.getmembers(self):
if (attr.find("_") == -1
and attr.find("serialize") == -1
and attr.find("context") == -1
and attr.find("node") == -1
and attr.find("schema") == -1):
if type(value) is xmlmap.fields.NodeList:
if len(value) > 0:
json.append(str.format('"{0}": [', attr))
for v in value:
json.append(v.to_json())
json.append(",")
json = json[:-1]
json.append("]")
else:
json.append(str.format('"{0}": null', attr))
elif (type(value) is xmlmap.fields.StringField
or type(value) is str
or type(value) is lxml.etree._ElementUnicodeResult):
value = JSON.dumps(value)
json.append(str.format('"{0}": {1}', attr, value))
elif (type(value) is xmlmap.fields.IntegerField
or type(value) is int
or type(value) is float):
json.append(str.format('"{0}": {1}', attr, value))
elif value is None:
json.append(str.format('"{0}": null', attr))
elif type(value) is list:
if len(value) > 0:
json.append(str.format('"{0}": [', attr))
for x in value:
json.append(x)
json.append(",")
json = json[:-1]
json.append("]")
else:
json.append(str.format('"{0}": null', attr))
else:
json.append(value.to_json(skipBegin=True))
json.append(",")
json = json[:-1]
if not skipBegin:
json.append('}')
json.append('}')
return ''.join(json)
Sample Class that implements Base
from eulxml import xmlmap
from models.alcalaMonth import AlcalaMonth
from models.alcalaBase import AlcalaBase
class AlcalaPage(AlcalaBase):
ROOT_NAME = "page"
id = xmlmap.StringField('pageID')
year = xmlmap.IntegerField('content/#yearID')
months = xmlmap.NodeListField('content/month', AlcalaMonth)
The toJSON() method on the base is the method that is using the Serializable class and is returning empty json, e.g. "{}". The to_json() is my attempt to for a json-like implementation but that has it's own problems (for some reason it skips certain properties / child objects for no reason I can see but thats a thread for another day).
If I attempt to access myobj.keys or myobj.values (both of which are exposed via Serializable) I can see property names and values as I would expect but I have no idea why json.dumps() produces an empty json string.
Does anyone have any idea why I cannot get these objects to serialize to json?! I've been pulling my hair out for weeks with this. Any help would be greatly appreciated.

So after a lot of playing around, I was finally able to fix this with jsonpickle and it took only 3 lines of code:
def toJson(self):
jsonpickle.set_preferred_backend('simplejson')
return jsonpickle.encode(self, unpicklable=False)
I used simplejson to eliminate some of the additional object notation that was being added and the unpicklable property removed the rest (I'm not sure if this would work with the default json backend as I didn't test it).
Now when I call toJson() on any object that inherits from this base class, I get very nice json and it works brilliantly.

Related

dumping YAML with tags as JSON

I know I can use ruamel.yaml to load a file with tags in it. But when I want to dump without them i get an error. Simplified example :-
from ruamel.yaml import YAML
from json import dumps
import sys
yaml = YAML()
data = yaml.load(
"""
!mytag
a: 1
b: 2
c: 2022-05-01
"""
)
try:
yaml2 = YAML(typ='safe', pure=True)
yaml.default_flow_style = True
yaml2.dump(data, sys.stdout)
except Exception as e:
print('exception dumping using yaml', e)
try:
print(dumps(data))
except Exception as e:
print('exception dumping using json', e)
exception dumping using cannot represent an object: ordereddict([('a', 1), ('b', 2), ('c', datetime.date(2022, 5, 1))])
exception dumping using json Object of type date is not JSON serializable
I cannot change the load() without getting an error on the tag. How to get output with tags stripped (YAML or JSON)?
You get the error because the neither the safe dumper (pure or not), nor JSON, do know about the ruamel.yaml internal
types that preserve comments, tagging, block/flow-style, etc.
Dumping as YAML, you could register these types with alternate dump methods. As JSON this is more complex
as AFAIK you can only convert the leaf-nodes (i.e. the YAML scalars, you would e.g. be
able to use that to dump the datetime.datetime instance that is loaded as the value of key c).
I have used YAML as a readable, editable and programmatically updatable config file with
an much faster loading JSON version of the data used if its file is not older than the corresponding YAML (if
it is older JSON gets created from the YAML). The thing to do in order to dump(s) is
recursively generate Python primitives that JSON understands.
The following does so, but there are other constructs besides datetime
instances that JSON doesn't allow. E.g. when using sequences or dicts
as keys (which is allowed in YAML, but not in JSON). For keys that are
sequences I concatenate the string representation of the elements
:
from ruamel.yaml import YAML
import sys
import datetime
import json
from collections.abc import Mapping
yaml = YAML()
data = yaml.load("""\
!mytag
a: 1
b: 2
c: 2022-05-01
[d, e]: !myseq [42, 196]
{f: g, 18: y}: !myscalar x
""")
def json_dump(data, out, indent=None):
def scalar(obj):
if obj is None:
return None
if isinstance(obj, (datetime.date, datetime.datetime)):
return str(obj)
if isinstance(obj, ruamel.yaml.scalarbool.ScalarBoolean):
return obj == 1
if isinstance(obj, bool):
return bool(obj)
if isinstance(obj, int):
return int(obj)
if isinstance(obj, float):
return float(obj)
if isinstance(obj, tuple):
return '_'.join([str(x) for x in obj])
if isinstance(obj, Mapping):
return '_'.join([f'{k}-{v}' for k, v in obj.items()])
if not isinstance(obj, str): print('type', type(obj))
return obj
def prep(obj):
if isinstance(obj, dict):
return {scalar(k): prep(v) for k, v in obj.items()}
if isinstance(obj, list):
return [prep(elem) for elem in obj]
if isinstance(obj, ruamel.yaml.comments.TaggedScalar):
return prep(obj.value)
return scalar(obj)
res = prep(data)
json.dump(res, out, indent=indent)
json_dump(data, sys.stdout, indent=2)
which gives:
{
"a": 1,
"b": 2,
"c": "2022-05-01",
"d_e": [
42,
196
],
"f-g_18-y": "x"
}

json.dumps not working with custom python object

I am not able to serialize a custom object in python 3 .
Below is the explanation and code
packages= []
data = {
"simplefield": "value1",
"complexfield": packages,
}
where packages is a list of custom object Library.
Library object is a class as below (I have also sub-classed json.JSONEncoder but it is not helping )
class Library(json.JSONEncoder):
def __init__(self, name):
print("calling Library constructor ")
self.name = name
def default(self, object):
print("calling default method ")
if isinstance(object, Library):
return {
"name": object.name
}
else:
raiseExceptions("Object is not instance of Library")
Now I am calling json.dumps(data) but it is throwing below exception.
TypeError: Object of type `Library` is not JSON serializable
It seems "calling default method" is not printed meaning, Library.default method is not called
Can anybody please help here ?
I have also referred to Serializing class instance to JSON but its is not helping much
Inheriting json.JSONEncoder will not make a class serializable. You should define your encoder separately and then provide the encoder instance in the json.dumps call.
See an example approach below
import json
from typing import Any
class Library:
def __init__(self, name):
print("calling Library constructor ")
self.name = name
def __repr__(self):
# I have just created a serialized version of the object
return '{"name": "%s"}' % self.name
# Create your custom encoder
class CustomEncoder(json.JSONEncoder):
def default(self, o: Any) -> Any:
# Encode differently for different types
return str(o) if isinstance(o, Library) else super().default(o)
packages = [Library("package1"), Library("package2")]
data = {
"simplefield": "value1",
"complexfield": packages,
}
#Use your encoder while serializing
print(json.dumps(data, cls=CustomEncoder))
Output was like:
{"simplefield": "value1", "complexfield": ["{\"name\": \"package1\"}", "{\"name\": \"package2\"}"]}
You can use the default parameter of json.dump:
def default(obj):
if isinstance(obj, Library):
return {"name": obj.name}
raise TypeError(f"Can't serialize {obj!r}.")
json.dumps(data, default=default)

django postgresql json field schema validation

I have a django model with a JSONField (django.contrib.postgres.fields.JSONField)
Is there any way that I can validate model data against a json schema file?
(pre-save)
Something like my_field = JSONField(schema_file=my_schema_file)
I wrote a custom validator using jsonschema in order to do this.
project/validators.py
import django
from django.core.validators import BaseValidator
import jsonschema
class JSONSchemaValidator(BaseValidator):
def compare(self, value, schema):
try:
jsonschema.validate(value, schema)
except jsonschema.exceptions.ValidationError:
raise django.core.exceptions.ValidationError(
'%(value)s failed JSON schema check', params={'value': value}
)
project/app/models.py
from django.db import models
from project.validators import JSONSchemaValidator
MY_JSON_FIELD_SCHEMA = {
'schema': 'http://json-schema.org/draft-07/schema#',
'type': 'object',
'properties': {
'my_key': {
'type': 'string'
}
},
'required': ['my_key']
}
class MyModel(models.Model):
my_json_field = models.JSONField(
default=dict,
validators=[JSONSchemaValidator(limit_value=MY_JSON_FIELD_SCHEMA)]
)
That's what the Model.clean() method is for (see docs). Example:
class MyData(models.Model):
some_json = JSONField()
...
def clean(self):
if not is_my_schema(self.some_json):
raise ValidationError('Invalid schema.')
you could use cerberus to validate your data against a schema
from cerberus import Validator
schema = {'name': {'type': 'string'}}
v = Validator(schema)
data = {'name': 'john doe'}
v.validate(data) # returns "True" (if passed)
v.errors # this would return the error dict (or on empty dict in case of no errors)
it's pretty straightforward to use (also due to it's good documentation -> validation rules: http://docs.python-cerberus.org/en/stable/validation-rules.html)
I wrote a custom JSONField that extends models.JSONField and validates attribute's value by using jsonschema (Django 3.1, Python 3.7).
I didn't use the validators parameter for one reason: I want to let users define the schema dynamically.So I use a schema parameter, that should be:
None (by default): the field will behave like its parent class (no JSON schema validation support).
A dict object. This option is suitable for a small schema definition (for example: {"type": "string"});
A str object that describes a path to the file where the schema code is contained. This option is suitable for a big schema definition (to preserve the beauty of the model class definition code). For searching I use all enabled finders: django.contrib.staticfiles.finders.find().
A function that takes a model instance as an argument and returns a schema as dict object. So you can build a schema based on the state of the given model instance. The function will be called every time when the validate() is called.
myapp/models/fields.py
import json
from jsonschema import validators as json_validators
from jsonschema import exceptions as json_exceptions
from django.contrib.staticfiles import finders
from django.core import checks, exceptions
from django.db import models
from django.utils.functional import cached_property
class SchemaMode:
STATIC = 'static'
DYNAMIC = 'dynamic'
class JSONField(models.JSONField):
"""
A models.JSONField subclass that supports the JSON schema validation.
"""
def __init__(self, *args, schema=None, **kwargs):
if schema is not None:
if not(isinstance(schema, (bool, dict, str)) or callable(schema)):
raise ValueError('The "schema" parameter must be bool, dict, str, or callable object.')
self.validate = self._validate
else:
self.__dict__['schema_mode'] = False
self.schema = schema
super().__init__(*args, **kwargs)
def check(self, **kwargs):
errors = super().check(**kwargs)
if self.schema_mode == SchemaMode.STATIC:
errors.extend(self._check_static_schema(**kwargs))
return errors
def _check_static_schema(self, **kwargs):
try:
schema = self.get_schema()
except (TypeError, OSError):
return [
checks.Error(
f"The file '{self.schema}' cannot be found.",
hint="Make sure that 'STATICFILES_DIRS' and 'STATICFILES_FINDERS' settings "
"are configured correctly.",
obj=self,
id='myapp.E001',
)
]
except json.JSONDecodeError:
return [
checks.Error(
f"The file '{self.schema}' contains an invalid JSON data.",
obj=self,
id='myapp.E002'
)
]
validator_cls = json_validators.validator_for(schema)
try:
validator_cls.check_schema(schema)
except json_exceptions.SchemaError:
return [
checks.Error(
f"{schema} must be a valid JSON Schema.",
obj=self,
id='myapp.E003'
)
]
else:
return []
def deconstruct(self):
name, path, args, kwargs = super().deconstruct()
if self.schema is not None:
kwargs['schema'] = self.schema
return name, path, args, kwargs
#cached_property
def schema_mode(self):
if callable(self.schema):
return SchemaMode.DYNAMIC
return SchemaMode.STATIC
#cached_property
def _get_schema(self):
if callable(self.schema):
return self.schema
elif isinstance(self.schema, str):
with open(finders.find(self.schema)) as fp:
schema = json.load(fp)
else:
schema = self.schema
return lambda obj: schema
def get_schema(self, obj=None):
"""
Return schema data for this field.
"""
return self._get_schema(obj)
def _validate(self, value, model_instance):
super(models.JSONField, self).validate(value, model_instance)
schema = self.get_schema(model_instance)
try:
json_validators.validate(value, schema)
except json_exceptions.ValidationError as e:
raise exceptions.ValidationError(e.message, code='invalid')
Usage:
myapp/models/__init__.py
def schema(instance):
schema = {}
# Here is your code that uses the other
# instance's fields to create a schema.
return schema
class JSONSchemaModel(models.Model):
dynamic = JSONField(schema=schema, default=dict)
from_dict = JSONField(schema={'type': 'object'}, default=dict)
# A static file: myapp/static/myapp/schema.json
from_file = JSONField(schema='myapp/schema.json', default=dict)
Another solution using jsonschema for simple cases.
class JSONValidatedField(models.JSONField):
def __init__(self, *args, **kwargs):
self.props = kwargs.pop('props')
self.required_props = kwargs.pop('required_props', [])
super().__init__(*args, **kwargs)
def validate(self, value, model_instance):
try:
jsonschema.validate(
value, {
'schema': 'http://json-schema.org/draft-07/schema#',
'type': 'object',
'properties': self.props,
'required': self.required_props
}
)
except jsonschema.exceptions.ValidationError:
raise ValidationError(
f'Value "{value}" failed schema validation.')
class SomeModel(models.Model):
my_json_field = JSONValidatedField(
props={
'foo': {'type': 'string'},
'bar': {'type': 'integer'}
},
required_props=['foo'])

Python 3 Tornado bytes and JSON issues

I stumbled into python 3, and specifically into tornado framework.
My task was to integrate facebook authentification, and i used test cases from here:
https://github.com/tornadoweb/tornado/tree/master/demos/facebook
So the point is that user is a dictionary with bytes data.
class AuthLoginHandler(BaseHandler, tornado.auth.FacebookGraphMixin):
#tornado.web.asynchronous
def get(self):
....
def _on_auth(self, user):
if not user:
raise tornado.web.HTTPError(500, "Facebook auth failed")
self.set_secure_cookie("fbdemo_user", tornado.escape.json_encode(user))
self.redirect(self.get_argument("next", "/"))
_on_auth always produces this Error: b'token or sesion_expire data here' is not JSON serializable
Ive come out with few solitons found on stackoverflow:
Fix the data before encode
import collections.abc
def convert(data):
'''
Converts bytes data into unicode strings, so this can be encoded into JSON
'''
if isinstance(data, str):
return str(data)
elif isinstance(data, bytes):
return data.decode('utf-8')
elif isinstance(data, collections.abc.Mapping):
return dict(map(convert, data.items()))
elif isinstance(data, collections.abc.Iterable):
return type(data)(map(convert, data))
else:
return data
# ... and somewhere in the code
tornado.escape.json_encode(convert(user))
And the next one is to extend the json itself:
import json
class JSONEncoder(json.JSONEncoder):
def default(self, o):
if isinstance(o, bytes):
return o.decode('utf-8')
return json.JSONEncoder.default(self, o)
Now the question: why are there such an isses with data like type(data) == <class 'bytes'>, and am i doing it right?
Thank you
Better late than never.
Python 3 json encoder does not accept byte strings. Tornado provides a method to_basestring which can be used to overcome this problem.
Here's what the source doc says about the issue:
In python2, byte and unicode strings are mostly interchangeable, so
functions that deal with a user-supplied argument in combination with
ascii string constants can use either and should return the type the
user supplied. In python3, the two types are not interchangeable, so
this method is needed to convert byte strings to unicode.
Usage:
tornado.escape.to_basestring(value)

Specializing JSON object encoding with python 3

I'm having some troubles due to the changes to dict.values() and keys() in Python3.
My old code was something like this:
import json
class ComplexEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, complex):
return [obj.real, obj.imag]
return json.JSONEncoder.default(self, obj)
a = { '1' : 2 + 1j, '2' : 4 + 2j }
print(json.dumps(a.values(), cls=ComplexEncoder))
This on Python 3.3+ raise the exception:
TypeError: dict_values([(2+1j), (4+2j)]) is not JSON serializable
The easy workaround is to do list(a.values()), the problem for me is that I have a lot instances like that in the code. Is there a way to extend the ComplexEncoder in order to iterate over the
view?
You could encode the iterable as a list:
class IterEncoder(json.JSONEncoder):
def default(self, obj):
try:
return list(obj)
except TypeError:
return super().default(obj)
class ComplexEncoder(IterEncoder):
def default(self, obj):
if isinstance(obj, complex):
return [obj.real, obj.imag]
return super().default(obj)