Serialize in JSON a base64 encoded data - json

I'm writing a script to automate data generation for a demo and I need to serialize in a JSON some data. Part of this data is an image, so I encoded it in base64, but when I try to run my script I get:
Traceback (most recent call last):
File "lazyAutomationScript.py", line 113, in <module>
json.dump(out_dict, outfile)
File "/usr/lib/python3.4/json/__init__.py", line 178, in dump
for chunk in iterable:
File "/usr/lib/python3.4/json/encoder.py", line 422, in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
File "/usr/lib/python3.4/json/encoder.py", line 396, in _iterencode_dict
yield from chunks
File "/usr/lib/python3.4/json/encoder.py", line 396, in _iterencode_dict
yield from chunks
File "/usr/lib/python3.4/json/encoder.py", line 429, in _iterencode
o = _default(o)
File "/usr/lib/python3.4/json/encoder.py", line 173, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: b'iVBORw0KGgoAAAANSUhEUgAADWcAABRACAYAAABf7ZytAAAABGdB...
...
BF2jhLaJNmRwAAAAAElFTkSuQmCC' is not JSON serializable
As far as I know, a base64-encoded-whatever (a PNG image, in this case) is just a string, so it should pose to problem to serializating. What am I missing?

You must be careful about the datatypes.
If you read a binary image, you get bytes.
If you encode these bytes in base64, you get ... bytes again! (see documentation on b64encode)
json can't handle raw bytes, that's why you get the error.
I have just written some example, with comments, I hope it helps:
from base64 import b64encode
from json import dumps
ENCODING = 'utf-8'
IMAGE_NAME = 'spam.jpg'
JSON_NAME = 'output.json'
# first: reading the binary stuff
# note the 'rb' flag
# result: bytes
with open(IMAGE_NAME, 'rb') as open_file:
byte_content = open_file.read()
# second: base64 encode read data
# result: bytes (again)
base64_bytes = b64encode(byte_content)
# third: decode these bytes to text
# result: string (in utf-8)
base64_string = base64_bytes.decode(ENCODING)
# optional: doing stuff with the data
# result here: some dict
raw_data = {IMAGE_NAME: base64_string}
# now: encoding the data to json
# result: string
json_data = dumps(raw_data, indent=2)
# finally: writing the json string to disk
# note the 'w' flag, no 'b' needed as we deal with text here
with open(JSON_NAME, 'w') as another_open_file:
another_open_file.write(json_data)

Alternative solution would be encoding stuff on the fly with a custom encoder:
import json
from base64 import b64encode
class Base64Encoder(json.JSONEncoder):
# pylint: disable=method-hidden
def default(self, o):
if isinstance(o, bytes):
return b64encode(o).decode()
return json.JSONEncoder.default(self, o)
Having that defined you can do:
m = {'key': b'\x9c\x13\xff\x00'}
json.dumps(m, cls=Base64Encoder)
It will produce:
'{"key": "nBP/AA=="}'

What am I missing?
The error is yelling that a binary is not JSON serializable.
from base64 import b64encode
# *binary representation* of the base64 string
assert b64encode(b"binary content") == b'YmluYXJ5IGNvbnRlbnQ='
# base64 string
assert b64encode(b"binary content").decode('utf-8') == 'YmluYXJ5IGNvbnRlbnQ='
The latter is definitely "JSON serializable" because is the base64 string representation of the binary b"binary content".

Related

json errors when appending data with Python

Good day.
I have a small password generator program and I want to save the created passwords into a json file (append each time) so I can add them to an SQLITE3 database.
Just trying to do the append functionality I receive several errors that I don't understand.
Here are the errors I receive and below that is the code itself.
I'm quite new to Python so additional details are welcomed.
Traceback (most recent call last):
File "C:\Users\whitmech\OneDrive - Six Continents Hotels, Inc\04 - Python\02_Mosh_Python_Course\Py_Projects\PWGenerator.py", line 32, in
data = json.load(file)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1264.0_x64__qbz5n2kfra8p0\lib\json_init_.py", line 293, in load
return loads(fp.read(),
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1264.0_x64__qbz5n2kfra8p0\lib\json_init_.py", line 346, in loads
return _default_decoder.decode(s)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1264.0_x64__qbz5n2kfra8p0\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1264.0_x64__qbz5n2kfra8p0\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
import random
import string
import sqlite3
import json
from pathlib import Path
print('hello, Welcome to Password generator!')
# input the length of password
length = int(input('\nEnter the length of password: '))
# define data
lower = string.ascii_lowercase
upper = string.ascii_uppercase
num = string.digits
symbols = string.punctuation
# string.ascii_letters
# combine the data
all = lower + upper + num + symbols
# use random
temp = random.sample(all, length)
# create the password
password = "".join(temp)
filename = 'saved.json'
entry = {password}
with open(filename, "r+") as file:
data = json.load(file)
data.append(entry)
file.seek(0)
json.dump(data, file)
# print the password
print(password)
Update: I've changed the JSON code as directed and it works but when trying to do the SQLite3 code I'm knowing receiving a typeerror
Code:
with open(filename, "r+") as file:
try:
data = json.load(file)
data.append(entry)
except json.decoder.JSONDecodeError as e:
data = entry
file.seek(0)
json.dump(data, file)
# print the password
print(password)
store = input('Would you like to store the password? ')
if store == "Yes":
pwStored = json.loads(Path("saved.json").read_text())
with sqlite3.connect("db.pws") as conn:
command = "INSERT INTO Passwords VALUES (?)"
for i in pwStored:
conn.execute(command, tuple(i.values)) # Error with this code
conn.commit()
else:
exit()
Error:
AttributeError: 'str' object has no attribute 'values'
The error is because
Your json file is empty, you need to update the following block
entry = [password]
with open(filename, "r+") as file:
try:
data = json.load(file)
data.extend(entry)
except json.decoder.JSONDecodeError as e:
data = entry
file.seek(0)
json.dump(data, file)
Also you are adding password in a set ie., entry, and it will again throw you an error TypeError: Object of type set is not JSON serializable
So you need to convert that to either a list or dict
Note: Here I have used entry as a list

Cannot read JSON with Pandas a file encoded in UCS-2 Little Endian

with open(filename+'.json') as json_file:
data=pd.io.json.read_json(json_file,encoding='utf_16_be')
I tried multiple options for encoding but it fails. It returns empty object. I can convert only when save my file in Notepad++ as UTF8 without BOM. I open it as normally with default encoding:
with open(filename+'.json') as json_file:
data=pd.io.json.read_json(json_file)
Default encoding of the file is UTC-2 Little Endian. How to read json with this encoding?
Read and follow import pandas as pd; help (pd.io.json.read_json). The following (partially commented) code snippet could help:
filename = r"D:\PShell\DataFiles\61571258" # my test case
import pandas as pd
filepath = filename + ".json"
# define encoding while opening a file
with open(filepath, encoding='utf-16') as f:
data = pd.io.json.read_json(f)
# or open file in binary mode and decode while converting to pandas object
with open(filepath, mode='rb') as f:
atad = pd.io.json.read_json(f, encoding='utf-16')
# ensure that both above methods are equivalent
print((data == atad).values)
Output: .\SO\69537408.py
[[ True True True True True True True]]

Pandas Reading json gives me ValueError: Invalid file path or buffer object type in Ploty Dash

This line:
else:
#add to this
nutrients_totals_df = pd.read_json(total_nutrients_json, orient='split')
is throwing the error.
I write my json like:
nutrients_json = nutrients_df.to_json(date_format='iso', orient='split')
Then I stash it in a hidden div or dcc.Storage in one callback and get it in another callback. How do I fix this error?
When I read json files that i've written with Pandas, I use the function below and call inside of json.loads().
def read_json_file_from_local(fullpath):
"""read json file from local"""
with open(fullpath, 'rb') as f:
data = f.read().decode('utf-8')
return data
df = json.loads(read_json_file_from_local(fullpath))

Convert a pipeline_pb2.TrainEvalPipelineConfig to JSON or YAML file for tensorflow object detection API

I want to convert a pipeline_pb2.TrainEvalPipelineConfig to JSON or YAML file format for tensorflow object detection API. I tried converting the protobuf file using :
import tensorflow as tf
from google.protobuf import text_format
import yaml
from object_detection.protos import pipeline_pb2
def get_configs_from_pipeline_file(pipeline_config_path, config_override=None):
'''
read .config and convert it to proto_buffer_object
'''
pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
with tf.gfile.GFile(pipeline_config_path, "r") as f:
proto_str = f.read()
text_format.Merge(proto_str, pipeline_config)
if config_override:
text_format.Merge(config_override, pipeline_config)
#print(pipeline_config)
return pipeline_config
def create_configs_from_pipeline_proto(pipeline_config):
'''
Returns the configurations as dictionary
'''
configs = {}
configs["model"] = pipeline_config.model
configs["train_config"] = pipeline_config.train_config
configs["train_input_config"] = pipeline_config.train_input_reader
configs["eval_config"] = pipeline_config.eval_config
configs["eval_input_configs"] = pipeline_config.eval_input_reader
# Keeps eval_input_config only for backwards compatibility. All clients should
# read eval_input_configs instead.
if configs["eval_input_configs"]:
configs["eval_input_config"] = configs["eval_input_configs"][0]
if pipeline_config.HasField("graph_rewriter"):
configs["graph_rewriter_config"] = pipeline_config.graph_rewriter
return configs
configs = get_configs_from_pipeline_file('pipeline.config')
config_as_dict = create_configs_from_pipeline_proto(configs)
But when I try converting this returned dictionary to YAML with yaml.dump(config_as_dict) it says
TypeError: can't pickle google.protobuf.pyext._message.RepeatedCompositeContainer objects
For json.dump(config_as_dict) it says :
Traceback (most recent call last):
File "config_file_parsing.py", line 48, in <module>
config_as_json = json.dumps(config_as_dict)
File "/usr/lib/python3.5/json/__init__.py", line 230, in dumps
return _default_encoder.encode(obj)
File "/usr/lib/python3.5/json/encoder.py", line 198, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python3.5/json/encoder.py", line 256, in iterencode
return _iterencode(o, 0)
File "/usr/lib/python3.5/json/encoder.py", line 179, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: label_map_path: "label_map.pbtxt"
shuffle: true
tf_record_input_reader {
input_path: "dataset.record"
}
is not JSON serializable
Would appreciate some help here.
JSON can only dump a subset of the python primtivies primitives and dict and list collections (with limitation on self-referencing).
YAML is more powerful, and can be used to dump arbitrary Python objects. But only if those objects can be "investigated" during the representation phase of the dump, which essentially limits that to instances of pure Python classes. For objects created at the C level, one can make explicit dumpers, and if not available Python will try and use the pickle protocol to dump the data to YAML.
Inspecing protobuf on PyPI shows me that there are non-generic wheels available, which is always an indication for some C code optimization. Inspecting one of these files indeed shows a pre-compiled shared object.
Although you make a dict out of the config, this dict can of course only be dumped when all its keys and all its values can be dumped. Since your keys are strings (necessary for JSON), you need to look at each of the values, to find the one that doesn't dump, and convert that to a dumpable object structure (dict/list for JSON, pure Python class for YAML).
You might want to take a look at Module json_format

How to save twitterscraper output as json file

I read the documentation, but the documentation only mentions saving output as .txt file. I tried to modify the code to save output as JSON.
save as .txt:
from twitterscraper import query_tweets
if __name__ == '__main__':
list_of_tweets = query_tweets("Trump OR Clinton", 10)
#print the retrieved tweets to the screen:
for tweet in query_tweets("Trump OR Clinton", 10):
print(tweet)
#Or save the retrieved tweets to file:
file = open(“output.txt”,”w”)
for tweet in query_tweets("Trump OR Clinton", 10):
file.write(tweet.encode('utf-8'))
file.close()
I tried to modify this to save as JSON:
output = query_tweets("Trump OR Clinton", 10)
jsonfile = open("tweets.json","w")
for tweet in output:
json.dump(tweet,jsonfile)
jsonfile.close()
TypeError: Object of type Tweet is not JSON serializable
But I get the above type error
How can I save output as JSON?
I know that typing command in termminal creates JSON, but I wanted to write a python version.
We'll need to convert each tweet to a dict first, as Python class objects are not serializable as JSON. Looking at the first object we can see the available methods and attributes like this: help(list_of_tweets[0]). Accessing the __dict__ of the first object we see:
# print(list_of_tweets[0].__dict__)
{'user': 'foobar',
'fullname': 'foobar',
'id': '143846459132929',
'url': '/foobar/status/1438420459132929',
'timestamp': datetime.datetime(2011, 12, 5, 23, 59, 53),
'text': 'blah blah',
'replies': 0,
'retweets': 0,
'likes': 0,
'html': '<p class="TweetTextSize...'}
Before we can dump it to json we'll need to convert the datetime objects to strings.
tweets = [t.__dict__ for t in list_of_tweets]
for t in tweets:
t['timestamp'] = t['timestamp'].isoformat()
Then we can use the json module to dump the data to a file.
import json
with open('data.json', 'w') as f:
json.dump(tweets, f)