I have a large object that is read from a binary file using struct.unpack and some of the values are character arrays which are read as bytes.
Since the character arrays in Python 3 are read as bytes instead of string (like in Python 2) they cannot be directly passed to json.dumps since "bytes" are not JSON serializable.
Is there any way to go from unpacked struct to json without searching through each value and converting the bytes to strings?
You can use a custom encoder in this case. See below
import json
x = {}
x['bytes'] = [b"i am bytes", "test"]
x['string'] = "strings"
x['unicode'] = u"unicode string"
class MyEncoder(json.JSONEncoder):
def default(self, o):
if type(o) is bytes:
return o.decode("utf-8")
return super(MyEncoder, self).default(o)
print(json.dumps(x, cls=MyEncoder))
# {"bytes": ["i am bytes", "test"], "string": "strings", "unicode": "unicode string"}
Related
I know I can use ruamel.yaml to load a file with tags in it. But when I want to dump without them i get an error. Simplified example :-
from ruamel.yaml import YAML
from json import dumps
import sys
yaml = YAML()
data = yaml.load(
"""
!mytag
a: 1
b: 2
c: 2022-05-01
"""
)
try:
yaml2 = YAML(typ='safe', pure=True)
yaml.default_flow_style = True
yaml2.dump(data, sys.stdout)
except Exception as e:
print('exception dumping using yaml', e)
try:
print(dumps(data))
except Exception as e:
print('exception dumping using json', e)
exception dumping using cannot represent an object: ordereddict([('a', 1), ('b', 2), ('c', datetime.date(2022, 5, 1))])
exception dumping using json Object of type date is not JSON serializable
I cannot change the load() without getting an error on the tag. How to get output with tags stripped (YAML or JSON)?
You get the error because the neither the safe dumper (pure or not), nor JSON, do know about the ruamel.yaml internal
types that preserve comments, tagging, block/flow-style, etc.
Dumping as YAML, you could register these types with alternate dump methods. As JSON this is more complex
as AFAIK you can only convert the leaf-nodes (i.e. the YAML scalars, you would e.g. be
able to use that to dump the datetime.datetime instance that is loaded as the value of key c).
I have used YAML as a readable, editable and programmatically updatable config file with
an much faster loading JSON version of the data used if its file is not older than the corresponding YAML (if
it is older JSON gets created from the YAML). The thing to do in order to dump(s) is
recursively generate Python primitives that JSON understands.
The following does so, but there are other constructs besides datetime
instances that JSON doesn't allow. E.g. when using sequences or dicts
as keys (which is allowed in YAML, but not in JSON). For keys that are
sequences I concatenate the string representation of the elements
:
from ruamel.yaml import YAML
import sys
import datetime
import json
from collections.abc import Mapping
yaml = YAML()
data = yaml.load("""\
!mytag
a: 1
b: 2
c: 2022-05-01
[d, e]: !myseq [42, 196]
{f: g, 18: y}: !myscalar x
""")
def json_dump(data, out, indent=None):
def scalar(obj):
if obj is None:
return None
if isinstance(obj, (datetime.date, datetime.datetime)):
return str(obj)
if isinstance(obj, ruamel.yaml.scalarbool.ScalarBoolean):
return obj == 1
if isinstance(obj, bool):
return bool(obj)
if isinstance(obj, int):
return int(obj)
if isinstance(obj, float):
return float(obj)
if isinstance(obj, tuple):
return '_'.join([str(x) for x in obj])
if isinstance(obj, Mapping):
return '_'.join([f'{k}-{v}' for k, v in obj.items()])
if not isinstance(obj, str): print('type', type(obj))
return obj
def prep(obj):
if isinstance(obj, dict):
return {scalar(k): prep(v) for k, v in obj.items()}
if isinstance(obj, list):
return [prep(elem) for elem in obj]
if isinstance(obj, ruamel.yaml.comments.TaggedScalar):
return prep(obj.value)
return scalar(obj)
res = prep(data)
json.dump(res, out, indent=indent)
json_dump(data, sys.stdout, indent=2)
which gives:
{
"a": 1,
"b": 2,
"c": "2022-05-01",
"d_e": [
42,
196
],
"f-g_18-y": "x"
}
I try to read JSON from file, get values, transform them and back write to new file.
{
"metadata": {
"info": "important info"
},
"timestamp": "2018-04-06T12:19:38.611Z",
"content": {
"id": "1",
"name": "name test",
"objects": [
{
"id": "1",
"url": "http://example.com",
"properties": [
{
"id": "1",
"value": "1"
}
]
}
]
}
}
Above is a JSON that I read from file.
Below I attach a python program that gets values, creates new JSON and write it to file.
import json
from pprint import pprint
def load_json(file_name):
return json.load(open(file_name))
def get_metadata(json):
return json["metadata"]
def get_timestamp(json):
return json["timestamp"]
def get_content(json):
return json["content"]
def create_json(metadata, timestamp, content):
dct = dict(__metadata=metadata, timestamp=timestamp, content=content)
return json.dumps(dct)
def write_json_to_file(file_name, json_content):
with open(file_name, 'w') as file:
json.dump(json_content, file)
STACK_JSON = 'stack.json';
STACK_OUT_JSON = 'stack-out.json'
if __name__ == '__main__':
json_content = load_json(STACK_JSON)
print("Loaded JSON:")
print(json_content)
metadata = get_metadata(json_content)
print("Metadata:", metadata)
timestamp = get_timestamp(json_content)
print("Timestamp:", timestamp)
content = get_content(json_content)
print("Content:", content)
created_json = create_json(metadata, timestamp, content)
print("\n\n")
print(created_json)
write_json_to_file(STACK_OUT_JSON, created_json)
But the problem is that create json is not correct. Finally as result I get:
"{\"__metadata\": {\"info\": \"important info\"}, \"timestamp\": \"2018-04-06T12:19:38.611Z\", \"content\": {\"id\": \"1\", \"name\": \"name test\", \"objects\": [{\"id\": \"1\", \"url\": \"http://example.com\", \"properties\": [{\"id\": \"1\", \"value\": \"1\"}]}]}}"
It is not that what I want to achieve. It's not correct JSON. What do I wrong?
Solution:
Change the write_json_to_file(...) method like this:
def write_json_to_file(file_name, json_content):
with open(file_name, 'w') as file:
file.write(json_content)
Explanation:
The problem is, that when you're calling write_json_to_file(STACK_OUT_JSON, created_json) at the end of your script, the variable created_json contains a string - it's the JSON representation of the dictionary created in the create_json(...) function. But inside the write_json_to_file(file_name, json_content), you're calling:
json.dump(json_content, file)
You're telling the json module write the JSON representation of variable json_content (which contains a string) into the file. And JSON representation of a string is a single value encapsulated in double-quotes ("), with all the double-quotes it contains escaped by \.
What you want to achieve is to simply write the value of the json_content variable into the file and not have it first JSON-serialized again.
Problem
You're converting a dict into a json and then right before you write it into a file, you're converting it into a json again. When you retry to convert a json to a json it gives you the \" since it's escaping the " since it assumes that you have a value there.
How to solve it?
It's a great idea to read the json file, convert it into a dict and perform all sorts of operations to it. And only when you want to print out an output or write to a file or return an output you convert to a json since json.dump() is expensive, it adds 2ms (approx) of overhead which might not seem much but when your code is running in 500 microseconds it's almost 4 times.
Other Recommendations
After seeing your code, I realize you're coming from a java background and while in java the getThis() or getThat() is a great way to module your code since we represent our code in classes in java, in python it just causes problems in the readability of the code as mentioned in the PEP 8 style guide for python.
I've updated the code below:
import json
def get_contents_from_json(file_path)-> dict:
"""
Reads the contents of the json file into a dict
:param file_path:
:return: A dictionary of all contents in the file.
"""
try:
with open(file_path) as file:
contents = file.read()
return json.loads(contents)
except json.JSONDecodeError:
print('Error while reading json file')
except FileNotFoundError:
print(f'The JSON file was not found at the given path: \n{file_path}')
def write_to_json_file(metadata, timestamp, content, file_path):
"""
Creates a dict of all the data and then writes it into the file
:param metadata: The meta data
:param timestamp: the timestamp
:param content: the content
:param file_path: The file in which json needs to be written
:return: None
"""
output_dict = dict(metadata=metadata, timestamp=timestamp, content=content)
with open(file_path, 'w') as outfile:
json.dump(output_dict, outfile, sort_keys=True, indent=4, ensure_ascii=False)
def main(input_file_path, output_file_path):
# get a dict from the loaded json
data = get_contents_from_json(input_file_path)
# the print() supports multiple args so you don't need multiple print statements
print('JSON:', json.dumps(data), 'Loaded JSON as dict:', data, sep='\n')
try:
# load your data from the dict instead of the methods since it's more pythonic
metadata = data['metadata']
timestamp = data['timestamp']
content = data['content']
# just cumulating your print statements
print("Metadata:", metadata, "Timestamp:", timestamp, "Content:", content, sep='\n')
# write your json to the file.
write_to_json_file(metadata, timestamp, content, output_file_path)
except KeyError:
print('Could not find proper keys to in the provided json')
except TypeError:
print('There is something wrong with the loaded data')
if __name__ == '__main__':
main('stack.json', 'stack-out.json')
Advantages of the above code:
More Modular and hence easily unit testable
Handling of exceptions
Readable
More pythonic
Comments because they are just awesome!
Dumping JSON using YAML,
c= {"a":1}
d = yaml.dump(c)
Loading JSON using YAML
yaml.load(d)
{'a': 1} # double quotes is lost
How to ensure that the output of the load has double quotes ?
Note: I tried json and simplejson also, all behave the same way.
For Python there is no difference between single and double quotes.
If you need response as JSON string then use standard json module - it will create string with correctly formated JSON - with double quotes.
>>> import json
>>> json.dumps({'a': 1})
'{"a": 1}'
Some frameworks or modules (as requests) have built-in functions to
send correctly-formated JSON (they may use standard json module in background) so don't have to do it on your own.
This
c = {"a":1}
d = yaml.dump(c)
doesn't dump JSON, it dumps a python dict as YAML. Use json.dumps() to make a JSON string from the dict and then optionally load/dump as YAML and preserve the double quotes by specifying preserver_quotes while loading:
import sys
import json
import ruamel.yaml
c= {"a":1}
json_string = json.dumps(c)
print(json_string)
print('---------')
data = ruamel.yaml.round_trip_load(json_string, preserve_quotes=True)
data['a'] = 3
ruamel.yaml.round_trip_dump(data, sys.stdout)
that will print:
{"a": 1}
---------
{"a": 3}
I stumbled into python 3, and specifically into tornado framework.
My task was to integrate facebook authentification, and i used test cases from here:
https://github.com/tornadoweb/tornado/tree/master/demos/facebook
So the point is that user is a dictionary with bytes data.
class AuthLoginHandler(BaseHandler, tornado.auth.FacebookGraphMixin):
#tornado.web.asynchronous
def get(self):
....
def _on_auth(self, user):
if not user:
raise tornado.web.HTTPError(500, "Facebook auth failed")
self.set_secure_cookie("fbdemo_user", tornado.escape.json_encode(user))
self.redirect(self.get_argument("next", "/"))
_on_auth always produces this Error: b'token or sesion_expire data here' is not JSON serializable
Ive come out with few solitons found on stackoverflow:
Fix the data before encode
import collections.abc
def convert(data):
'''
Converts bytes data into unicode strings, so this can be encoded into JSON
'''
if isinstance(data, str):
return str(data)
elif isinstance(data, bytes):
return data.decode('utf-8')
elif isinstance(data, collections.abc.Mapping):
return dict(map(convert, data.items()))
elif isinstance(data, collections.abc.Iterable):
return type(data)(map(convert, data))
else:
return data
# ... and somewhere in the code
tornado.escape.json_encode(convert(user))
And the next one is to extend the json itself:
import json
class JSONEncoder(json.JSONEncoder):
def default(self, o):
if isinstance(o, bytes):
return o.decode('utf-8')
return json.JSONEncoder.default(self, o)
Now the question: why are there such an isses with data like type(data) == <class 'bytes'>, and am i doing it right?
Thank you
Better late than never.
Python 3 json encoder does not accept byte strings. Tornado provides a method to_basestring which can be used to overcome this problem.
Here's what the source doc says about the issue:
In python2, byte and unicode strings are mostly interchangeable, so
functions that deal with a user-supplied argument in combination with
ascii string constants can use either and should return the type the
user supplied. In python3, the two types are not interchangeable, so
this method is needed to convert byte strings to unicode.
Usage:
tornado.escape.to_basestring(value)
I'm using json.dump() and json.load() to save/read a dictionary of strings to/from disk. The issue is that I can't have any of the strings in unicode. They seem to be in unicode no matter how I set the parameters to dump/load (including ensure_ascii and encoding).
If you are just dealing with simple JSON objects, you can use the following:
def ascii_encode_dict(data):
ascii_encode = lambda x: x.encode('ascii')
return dict(map(ascii_encode, pair) for pair in data.items())
json.loads(json_data, object_hook=ascii_encode_dict)
Here is an example of how it works:
>>> json_data = '{"foo": "bar", "bar": "baz"}'
>>> json.loads(json_data) # old call gives unicode
{u'foo': u'bar', u'bar': u'baz'}
>>> json.loads(json_data, object_hook=ascii_encode_dict) # new call gives str
{'foo': 'bar', 'bar': 'baz'}
This answer works for a more complex JSON structure, and gives some nice explanation on the object_hook parameter. There is also another answer there that recursively takes the result of a json.loads() call and converts all of the Unicode strings to byte strings.
And if the json object is a mix of datatypes, not only unicode strings, you can use this expression:
def ascii_encode_dict(data):
ascii_encode = lambda x: x.encode('ascii') if isinstance(x, unicode) else x
return dict(map(ascii_encode, pair) for pair in data.items())