Using JSON keys as attributes in nested JSON - json

I'm working with nested JSON-like data structures in python 2.7 that I exchange with some foreign perl code. I just want to 'work with' these nested structures of lists and dictionaries in amore pythonic way.
So if I have a structure like this...
a = {
'x': 4,
'y': [2, 3, { 'a': 55, 'b': 66 }],
}
...I want to be able to deal with it in a python script as if it was nested python classes/Structs, like this:
>>> aa = j2p(a) # <<- this is what I'm after.
>>> print aa.x
4
>>> aa.z = 99
>>> print a
{
'x': 4,
'y': [2, 3, { 'a': 55, 'b': 66 }],
'z': 99
}
>>> aa.y[2].b = 999
>>> print a
{
'x': 4,
'y': [2, 3, { 'a': 55, 'b': 999 }],
'z': 99
}
Thus aa is a proxy into the original structure. This is what I came up with so far, inspired by the excellent What is a metaclass in Python? question.
def j2p(x):
"""j2p creates a pythonic interface to nested arrays and
dictionaries, as returned by json readers.
>>> a = { 'x':[5,8], 'y':5}
>>> aa = j2p(a)
>>> aa.y=7
>>> print a
{'x': [5, 8], 'y':7}
>>> aa.x[1]=99
>>> print a
{'x': [5, 99], 'y':7}
>>> aa.x[0] = {'g':5, 'h':9}
>>> print a
{'x': [ {'g':5, 'h':9} , 99], 'y':7}
>>> print aa.x[0].g
5
"""
if isinstance(x, list):
return _list_proxy(x)
elif isinstance(x, dict):
return _dict_proxy(x)
else:
return x
class _list_proxy(object):
def __init__(self, proxied_list):
object.__setattr__(self, 'data', proxied_list)
def __getitem__(self, a):
return j2p(object.__getattribute__(self, 'data').__getitem__(a))
def __setitem__(self, a, v):
return object.__getattribute__(self, 'data').__setitem__(a, v)
class _dict_proxy(_list_proxy):
def __init__(self, proxied_dict):
_list_proxy.__init__(self, proxied_dict)
def __getattribute__(self, a):
return j2p(object.__getattribute__(self, 'data').__getitem__(a))
def __setattr__(self, a, v):
return object.__getattribute__(self, 'data').__setitem__(a, v)
def p2j(x):
"""p2j gives back the underlying json-ic json-ic nested
dictionary/list structure of an object or attribute created with
j2p.
"""
if isinstance(x, (_list_proxy, _dict_proxy)):
return object.__getattribute__(x, 'data')
else:
return x
Now I wonder whether there is an elegant way of mapping a whole set of the __*__ special functions, like __iter__, __delitem__? so I don't need to unwrap things using p2j() just to iterate or do other pythonic stuff.
# today:
for i in p2j(aa.y):
print i
# would like to...
for i in aa.y:
print i

I think you're making this more complex than it needs to be. If I understand you correctly, all you should need to do is this:
import json
class Struct(dict):
def __getattr__(self, name):
return self[name]
def __setattr__(self, name, value):
self[name] = value
def __delattr__(self, name):
del self[name]
j = '{"y": [2, 3, {"a": 55, "b": 66}], "x": 4}'
aa = json.loads(j, object_hook=Struct)
for i in aa.y:
print(i)
When you load JSON, the object_hook parameter lets you specify a callable object to process objects that it loads. I've just used it to turn the dict into an object that allows attribute access to its keys. Docs

There is an attrdict library that does exactly that in a very safe manner, but if you want, a quick and dirty (possibly leaking memory) approach was given in this answer:
class AttrDict(dict):
def __init__(self, *args, **kwargs):
super(AttrDict, self).__init__(*args, **kwargs)
self.__dict__ = self
j = '{"y": [2, 3, {"a": 55, "b": 66}], "x": 4}'
aa = json.loads(j, object_hook=AttrDict)

I found the answer: There is intentionally no way to automatically map the special methods in python, using __getattribute__. So to achieve what I want, I need to explicitely define all special methods like __len__ one after the other.

Related

How do I divide each element in a list by an int using function

I have a dictionary that contains lists as value, and I want to divide each element in those lists on constant, how can I do that using a def function?!
Assuming you're using python and that I got your question, simple way of doing that:
import numpy as np
def divide(input_list, dividend):
return list(np.array(input_list) / dividend)
You'd probably want to use something like:
CONSTANT_K = 2
dict = { 'a': 1, 'b': 2, 'c': 3, 'd': 4 }
for value in dict.values():
quotientResult = value / CONSTANT_K
# do whatever you want with the quotient result here
Where CONSTANT_K is your constant. Then, you iterate through your dictionary values in a for loop. The loop takes each value in the dictionary and divides it by the constant. You can handle the values inside of the for loop, or you can store them inside a new dictionary or array.
You can put this into a def function by doing:
CONSTANT_K = 2
dict = { 'a': 1, 'b': 2, 'c': 3, 'd': 4 }
def divideDict(k, dictA):
for value in dict.values():
quotientResult = value / CONSTANT_K
# do whatever you want with the quotient result here
divideDict(CONSTANT_K, dict)
Where divideDict() is your function. If you're looking for a dictionary containing lists, you'll have to loop through the lists as well:
CONSTANT_K = 2
dict = { 'a': [1, 2], 'b': [3, 4], 'c': [5, 6], 'd': [7, 8] }
def divideDict(k, dictA):
for value in dict.values():
for val in value:
quotientResult = val / CONSTANT_K
# do whatever you want with the quotient result here
divideDict(CONSTANT_K, dict)

How to stop DjangoJSONEncoder from truncating microseconds datetime objects?

I have a dictionary with a datetime object inside it and when I try to json dump it, Django truncates the microseconds:
> dikt
{'date': datetime.datetime(2020, 6, 22, 11, 36, 25, 763835, tzinfo=<DstTzInfo 'Africa/Nairobi' EAT+3:00:00 STD>)}
> json.dumps(dikt, cls=DjangoJSONEncoder)
'{"date": "2020-06-22T11:36:25.763+03:00"}'
How can I preserve all the 6 microsecond digits?
DjangoJsonEncoder support ECMA-262 specification.
You can easily overcome this by introducing your custom encoder.
class MyCustomEncoder(DjangoJSONEncoder):
def default(self, obj):
if isinstance(obj, datetime.datetime):
r = obj.isoformat()
if r.endswith('+00:00'):
r = r[:-6] + 'Z'
return r
return super(MyCustomEncoder, self).default(obj)
dateime_object = datetime.datetime.now()
print(dateime_object)
print(json.dumps(dateime_object, cls=MyCustomEncoder))
>>> 2020-06-22 11:54:29.127120
>>> "2020-06-22T11:54:29.127120"

Nested classes are not serializable in python when trying JSON dump

I currently have two classes in Python like these ones
class person:
age=""
name=""
ranking = {}
def addRanking():
#Do Whatever treatment and add to the ranking dict
class ranking:
semester = ""
position = ""
gpa = ""
I have my list of person as a dictionary called dictP json.dumps() this dictionary but it seems that it doesn't work. Here is my function to dump to JSON
def toJson():
jsonfile = open('dict.json', 'w')
print(json.dump(listP, jsonfile))
I get the famous: is not JSON serializable.
Would you know what I can do to help this problem. I thought that having two dictionaries (which are serializable) would avoid this kind of issue, but apparently not.
Thanks in advance
Edit:
Here is an example (typed on my phone sorry for typos, I'm not sure it does run but it's so you get the idea):
class person:
age=""
name=""
ranking = {}
def __init__(self, age, name):
self.age = age
self.name = name
self.ranking = {}
def addRanking(self,semester,position,gpa):
#if the semester is not already present in the data for that person
self.ranking[semester] = make_ranking(semester,position,gpa)
class ranking:
semester = ""
position = ""
gpa = ""
def __init__(self, semester, position, gpa):
self.semester = semester
self.position = position
self.gpa = gpa
dictP = {}
def make_person(age, name):
# Some stuff happens there
return person(age,name)
def make_ranking(semester,postion,gpa):
#some computation there
return ranking(semester,position,gpa)
def pretending_to_read_csv():
age = 12
name = "Alice"
p = make_person(age, name)
dictP["1"] = p
age = 13
name = "Alice"
p = make_person(age, name)
dictP["2"] = p
#We read a csv for ranking that gives us an ID
semester = 1
position = 4
gpa = 3.2
id = 1
dictP["1"].addRanking(semester, position, gpa)
semester = 2
position = 4
gpa = 3.2
id = 1
dictP["1"].addRanking(semester, position, gpa)
For a dictionary to be serializable, note that all the keys & values in that dictionary must be serializable as well. You did not show us what listP contains, but I'm guessing it's something like this:
>>> listP
[<__main__.person instance at 0x107b65290>, <__main__.person instance at 0x107b65368>]
Python instances are not serializable.
I think you want a list of dictionaries, which would look like this:
>>> listP
[{'ranking': {}, 'age': 10, 'name': 'fred'}, {'ranking': {}, 'age': 20, 'name': 'mary'}]
This would serialize as you expect:
>>> import json
>>> json.dumps(listP)
'[{"ranking": {}, "age": 10, "name": "fred"}, {"ranking": {}, "age": 20, "name": "mary"}]'
UPDATE
(Thanks for adding example code.)
>>> pretending_to_read_csv()
>>> dictP
{'1': <__main__.person instance at 0x107b65368>, '2': <__main__.person instance at 0x107b863b0>}
Recall that user-defined classes cannot be serialized automatically. It's possible to extend the JSONEncoder directly to handle these cases, but all you really need is a function that can turn your object into a dictionary comprised entirely of primitives.
def convert_ranking(ranking):
return {
"semester": ranking.semester,
"position": ranking.position,
"gpa": ranking.gpa}
def convert_person(person):
return {
"age": person.age,
"name": person.name,
"ranking": {semester: convert_ranking(ranking) for semester, ranking in person.ranking.iteritems()}}
One more dictionary comprehension to actually do the conversion and you're all set:
>>> new_dict = {person_id: convert_person(person) for person_id, person in dictP.iteritems()}
>>> from pprint import pprint
>>> pprint(new_dict)
{'1': {'age': 12,
'name': 'Alice',
'ranking': {1: {'gpa': 3.2, 'position': 4, 'semester': 1},
2: {'gpa': 3.2, 'position': 4, 'semester': 2}}},
'2': {'age': 13, 'name': 'Alice', 'ranking': {}}}
Since no user-defined objects are stuffed in there, this will serialize as you hope:
>>> json.dumps(new_dict)
'{"1": {"ranking": {"1": {"position": 4, "semester": 1, "gpa": 3.2}, "2": {"position": 4, "semester": 2, "gpa": 3.2}}, "age": 12, "name": "Alice"}, "2": {"ranking": {}, "age": 13, "name": "Alice"}}'
You can try calling json.dump on the .__dict__ member of your instance. You say that you have a list of person instances so try doing something like this:
listJSON = []
for p in listP
#append the value of the dictionary containing data about your person instance to a list
listJSON.append(p.__dict__)
json.dump(listJSON, jsonfile)
If you are storing your person instances in a dictionary like so: dictP = {'person1': p1, 'person2': p2} this solution will loop through the keys and change their corresponding values to the __dict__ member of the instance:
for key in dictP:
dictP[key] = dictP[key].__dict__
json.dump(dictP, jsonfile)

Why does this function always result in a generator?

I'm confused by the following in Python 3.3:
>>> def foo(gen=False):
... if not gen:
... return list(range(10))
... else:
... for i in range(10):
... yield i
...
>>> foo()
<generator object foo at 0xb72a016c>
>>> foo(gen=False)
<generator object foo at 0xb72a0144>
>>> foo(gen=True)
<generator object foo at 0xb72a089c>
>>>
What am I misunderstanding? If gen is False, the default value, then not gen is True, and thus I should get a list of integers [0,1,2,3,4,5,6,7,8,9]. On the other hand, if it is True, shouldn't (not gen) == False result in a generator?
Inclusion of yield in a function makes it into an generator function: When you execute the function, you get a generator; no other execution takes place. The function itself starts getting executed only when the generator starts being asked for elements.
def not_a_generator():
print(1)
print(2)
not_a_generator()
# => 1
# 2
def is_a_generator():
print(1)
yield 7
print(2)
is_a_generator()
# => <generator object bar at 0x10e1471a8>
list(is_a_generator())
# => 1
# 2
# [7]
It does not matter that you have put the yield statement in an if branch. The documentation says:
Using yield in a function definition is sufficient to cause that definition to create a generator function instead of a normal function.
However, you can achieve what you had intended by simply defining an inner generator function:
>>> def foo(gen=False):
... if not gen:
... return list(range(10))
... else:
... def foo(): # the inner generator
... for i in range(10):
... yield i
... return foo()
>>> foo()
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> foo(gen=False)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> foo(gen=True)
<generator object foo.<locals>.foo at 0x7f350c7563b8>
>>> g = foo(gen=True)
>>> next(g)
0
This time, the yield statement turns the inner foo into a generator. The outer foo remains a normal function.

Counting in sequences

Say I want to count values in sequence xs of how many times it appears in v and return the integers in a list in the same order. Including dups. This the code I have so far and I'm kind of stuck on what to do. Trying to keep it simple without .count funcs and what not.
def count_each(xs,v):
count = []
for i in range(len(xs)):
if xs(i) == v:
return count.append(i)
return count
You could use the list method count().
>>> keys = [10, 20, 30]
>>> search = [10, 20, 50, 20, 40, 20]
>>> print [search.count(key) for key in keys]
[1, 3, 0]
alternatively O(n),
>>> from collections import Counter
>>> c = Counter(search)
>>> print [c[key] for key in keys]
[1, 3, 0]
Below is the sample function to achieve this using collections.defaultdict:
from collections import defaultdict
def count_each(xs, v):
count = defaultdict(int)
for item in v:
if item in xs:
count[item] += 1
return [count[item] for item in xs]
OR, using simple dict as:
def count_each(xs, v):
count = {}
for item in v:
if item in xs:
if item not in count:
count[item] = 0
count[item] += 1
return [count.get(item, 0) for item in xs]
Sample call:
>>> count_each([10,20,30],[10,20,50,20,40,20])
[1, 3, 0]
You can use function list.count(element) to count element in list
So your code can look like this
def count_each(xs, v):
result = []
for element in xs:
result.append( v.count(element) )
return result
count_each([10,20,30],[10,20,50,20,40,20])
or you can write it shorter using list comprehension - see #mrdomoboto answer.
EDIT: the same without count()
def count_each(xs, v):
result = []
for element in xs:
count = 0
for x in v:
if x == element:
count += 1
result.append( count )
return result
count_each([10,20,30],[10,20,50,20,40,20])
BTW: you can't do this with dict because dict don't have to keep order. One time you can get result [1,3,0] but another time [3,1,0] or [0,1,3], etc.