Convert a Bytes String to Dictionary in Python

Convert a Bytes String to Dictionary in Python - json

Basic Information
I am creating a python script that can encrypt and decrypt a file with previous session data.
The Problem
I am able to decrypt my file and read it using a key. This returns a bytes string which I can in turn convert to a string. However, this string needs to be converted to a dictionary, which I cannot do. Using ast, json and eval I have run into errors.
Bytes string
decrypted = fernet.decrypt(encrypted)
String
string = decrypted.decode("UTF-8").replace("'", '"')
If I use eval() or ast.literal_eval() I get the following error:
Then I tried using json.loads() and I get the following error:
The information blocked out on both images is to protect my SSH connections. In the first image it is giving me a SyntaxError at the last digit of my ip address.
The Function
The function that is responsible for this when called looks like this:
def FileDecryption():
with open('enc_key.key', 'rb') as filekey:
key = filekey.read()
filekey.close()
fernet = Fernet(key)
with open('saved_data.txt', 'rb') as enc_file:
encrypted = enc_file.read()
enc_file.close()
decrypted = fernet.decrypt(encrypted)
print(decrypted)
string = decrypted.decode("UTF-8").replace("'", '"')
data = f'{string}'
print(data)
#data = eval(data)
data = json.loads(data)
print(type(data))
for key in data:
#command_string = ["load", data[key][1], data[key][2], data[key][3], data[key][4]]
#SSH.CreateSSH(command_string)
print(key)
Any help would be appreciated. Thanks!

Your data seems like it was written incorrectly in the first place, but without a complete example hard to say.
Here's a complete example that round-trips a JSON-able data object.
# requirement:
# pip install cryptography
from cryptography.fernet import Fernet
import json
def encrypt(data, data_filename, key_filename):
key = Fernet.generate_key()
with open(key_filename, 'wb') as file:
file.write(key)
fernet = Fernet(key)
encrypted = fernet.encrypt(json.dumps(data).encode())
with open(data_filename, 'wb') as file:
file.write(encrypted)
def decrypt(data_filename, key_filename):
with open(key_filename, 'rb') as file:
key = file.read()
fernet = Fernet(key)
with open(data_filename, 'rb') as file:
return json.loads(fernet.decrypt(file.read()))
data = {'key1': 'value1', 'key2': 'value2'}
encrypt(data, 'saved_data.txt', 'enc_key.key')
decrypted = decrypt('saved_data.txt', 'enc_key.key')
print(decrypted)
Output:
{'key1': 'value1', 'key2': 'value2'}

Related

Convert csv to json with django admin actions

Register a csv in django-admin and through an action in django-admin to convert to json and store the value in a JSONField
However, in action django-admin I'm getting this error and I can't convert it to json...
admin.py
(....)
def read_data_csv(path):
with open(path, newline='') as csvfile:
reader = csv.DictReader(csvfile)
data = []
for row in reader:
data.append(dict(row))
return data
def convert(modeladmin, request, queryset):
for extraction in queryset:
csv_file_path = extraction.lawsuits
read_data_csv(csv_file_path)
Error:
TypeError at /admin/core/extraction/
expected str, bytes or os.PathLike object, not FieldFile

This is about extraction.lawsuits which is a FieldFile instance. Just pass read_data_csv function csv_file_path.path as argument. That should work.

Why does Pycryptodome MAC check fail when encrypting and decrypting JSON files?

I am trying to do encrypt some JSON data with AES-256, using a password hashed with pbkdf2_sha256 as the key. I want to store the data in a file, be able to load it up, decrypt it, alter it, encrypt it, store it, and repeat.
I am using the passlib and pycryptodome libraries with python 3.8. The following test occurs inside a docker container and throws an error I haven't been able to correct
Does anyone have any clues on how I can improve my code (and knowledge)?
Test.py:
import os, json
from Crypto.PublicKey import RSA
from Crypto.Cipher import AES
from passlib.hash import pbkdf2_sha256
def setJsonData(jsonData, jsonFileName):
with open(jsonFileName, 'wb') as jsonFile:
password = 'd'
key = pbkdf2_sha256.hash(password)[-16:]
data = json.dumps(jsonData).encode("utf8")
cipher = AES.new(key.encode("utf8"), AES.MODE_EAX)
ciphertext, tag = cipher.encrypt_and_digest(data)
[ jsonFile.write(x) for x in (cipher.nonce, tag, ciphertext) ]
def getJsonData(jsonFileName):
with open(jsonFileName, 'rb') as jsonFile:
password = 'd'
key = pbkdf2_sha256.hash(password)[-16:]
nonce, tag, ciphertext = [ jsonFile.read(x) for x in (16, 16, -1) ]
cipher = AES.new(key.encode("utf8"), AES.MODE_EAX, nonce)
data = cipher.decrypt_and_verify(ciphertext, tag)
return json.loads(data)
dictTest = {}
dictTest['test'] = 1
print(str(dictTest))
setJsonData(dictTest, "test")
dictTest = getJsonData("test")
print(str(dictTest))
Output:
{'test': 1}
Traceback (most recent call last):
File "test.py", line 37, in <module>
dictTest = getJsonData("test")
File "test.py", line 24, in getJsonData
data = cipher.decrypt_and_verify(ciphertext, tag)
File "/usr/local/lib/python3.8/site-packages/Crypto/Cipher/_mode_eax.py", line 368, in decrypt_and_verify
self.verify(received_mac_tag)
File "/usr/local/lib/python3.8/site-packages/Crypto/Cipher/_mode_eax.py", line 309, in verify
raise ValueError("MAC check failed")
ValueError: MAC check failed
Research:
Looked into this answer, but I believe my verify() call is in
the right place
I noted that in the python docs, it says:
loads(dumps(x)) != x if x has non-string keys.
but, when I re-run the test with dictTest['test'] = 'a' I have the same error.
I suspected the problem was the json formatting, so I did the same test with a string and didn't make the json.loads and json.dumps calls, but I have the same error

The problem here is that key = pbkdf2_sha256.hash(password)[-16:] hashes the key with a new salt each call. Therefore, the cipher used to encrypt and decrypt the cipher text is going to be different, yielding different data, and thus failing the integrity check.
I changed my key derivation function to the following:
h = SHA3_256.new()
h.update(password.encode("utf-8"))
key = h.digest()

keep getting error message when trying to read json from s3

I keep getting this error in my lambda function:
{"errorMessage": "module initialization error"}
This happens when i try to turn the following string containing json data into a json dictionary object within python.
"{\n\"main\": {\n \"PART_NAME\": \"Genuine Cardboard Honda Wing\",\n \"BRAND\": \"Honda\",\n \"MJR_CAT\": \"Aero\",\n \"CAT\": \"Rear Wing\",\n \"SUB_CAT\": \"NA\",\n \"Power_Increase\": \"0\"\n},\n\"forza\":\n{\n \"power\": \"[0, True]\",\n \"Torque\": \"[0, True]\",\n \"Traction\": \"[50, True]\",\n \"Handling\": \"[100, True]\",\n \"Breaking\": \"[40, True]\"\n},\n\"custom\": {\n\"length\": 120,\n\"car max height[m]\": 2,\n\"RICER RANK\": -10\n\n}\n"
Here is my code to replicate this error:
client = boto3.client('s3')
result = client.get_object(Bucket=BUCKET, Key=FILE_TO_READ)
text = result['Body'].read().decode('utf-8')
text = json.load(text)
print(text)
without the print(text) it produces that string above.
Thanks :)
Here is the full lambda function (though not commented) if you are interested.
import json
import boto3
print('got this far')
BUCKET = '******'
FILE_TO_READ = 'example_honda_wing.json'
client = boto3.client('s3')
result = client.get_object(Bucket=BUCKET, Key=FILE_TO_READ)
text = result['Body'].read().decode('utf-8')
#text = str(text).replace("\n","")
#text = text.replace('\"',' ')
#text = json.load(text)
print(text) # Use your desired JSON Key for your value
def lambda_handler(event, context):
# TODO implement
return text

How to work with JSON in python

My Python Script gives a JSON output. How can I see it in the proper JSON format?
I tried with parsing with json.dumps() and json.loads(), but could not achieve the desired result.
======= Myscript.py ========
import sys
import jenkins
import json
import credentials
# Credentails
username = credentials.login['username']
password = credentials.login['password']
# Print the number of jobs present in jenkins
server = jenkins.Jenkins('http://localhost:8080', username=username, password=password)
# Get the installed Plugin info
plugins = server.get_plugins_info()
#parsed = json.loads(plugins) # take a string as input and returns a dictionary as output.
parsed = json.dumps(plugins) # take a dictionary as input and returns a string as output.
#print(json.dumps(parsed, indent=4, sort_keys=True))
print(plugins)
print(parsed)

It sounds like you want to pretty-print your JSON. You would need to pass the correct parameters to json.dumps():
parsed = json.dumps(plugins, sort_keys=True, indent=4)
Check and see if that is what you are looking for.

Reading the data written to s3 by Amazon Kinesis Firehose stream

I am writing record to Kinesis Firehose stream that is eventually written to a S3 file by Amazon Kinesis Firehose.
My record object looks like
ItemPurchase {
String personId,
String itemId
}
The data is written to S3 looks like:
{"personId":"p-111","itemId":"i-111"}{"personId":"p-222","itemId":"i-222"}{"personId":"p-333","itemId":"i-333"}
NO COMMA SEPERATION.
NO STARTING BRACKET as in a Json Array
[
NO ENDING BRACKET as in a Json Array
]
I want to read this data get a list of ItemPurchase objects.
List<ItemPurchase> purchases = getPurchasesFromS3(IOUtils.toString(s3ObjectContent))
What is the correct way to read this data?

It boggles my mind that Amazon Firehose dumps JSON messages to S3 in this manner, and doesn't allow you to set a delimiter or anything.
Ultimately, the trick I found to deal with the problem was to process the text file using the JSON raw_decode method
This will allow you to read a bunch of concatenated JSON records without any delimiters between them.
Python code:
import json
decoder = json.JSONDecoder()
with open('giant_kinesis_s3_text_file_with_concatenated_json_blobs.txt', 'r') as content_file:
content = content_file.read()
content_length = len(content)
decode_index = 0
while decode_index < content_length:
try:
obj, decode_index = decoder.raw_decode(content, decode_index)
print("File index:", decode_index)
print(obj)
except JSONDecodeError as e:
print("JSONDecodeError:", e)
# Scan forward and keep trying to decode
decode_index += 1

I also had the same problem, here is how I solved.
replace "}{" with "}\n{"
line split by "\n".
input_json_rdd.map(lambda x : re.sub("}{", "}\n{", x, flags=re.UNICODE))
.flatMap(lambda line: line.split("\n"))
A nested json object has several "}"s, so split line by "}" doesn't solve the problem.

I've had the same issue.
It would have been better if AWS allowed us to set a delimiter but we can do it on our own.
In my use case, I've been listening on a stream of tweets, and once receiving a new tweet I immediately put it to Firehose.
This, of course, resulted in a 1-line file which could not be parsed.
So, to solve this, I have concatenated the tweet's JSON with a \n.
This, in turn, let me use some packages that can output lines when reading stream contents, and parse the file easily.
Hope this helps you.

I think the best ways to tackle this is to first create a properly formatted json file containing well separated json objects within them. In my case I added ',' to the events which was pushed into the firehose. Then After a file is saved in s3, all the files will contain json object separated by some delimitter(comma- in our case). Another thing that must be added are '[' and ']' at the beginning and end of the file. Then you have a proper json file containing multiple json objects. Parsing them will be possible now.

If the input source for the firehose is an Analytics application, this concatenated JSON without a delimiter is a known issue as cited here. You should have a lambda function as here that outputs JSON objects in multiple lines.

I used a transformation Lambda to add a line break at the end of every record
def lambda_handler(event, context):
output = []
for record in event['records']:
# Decode from base64 (Firehose records are base64 encoded)
payload = base64.b64decode(record['data'])
# Read json as utf-8
json_string = payload.decode("utf-8")
# Add a line break
output_json_with_line_break = json_string + "\n"
# Encode the data
encoded_bytes = base64.b64encode(bytearray(output_json_with_line_break, 'utf-8'))
encoded_string = str(encoded_bytes, 'utf-8')
# Create a deep copy of the record and append to output with transformed data
output_record = copy.deepcopy(record)
output_record['data'] = encoded_string
output_record['result'] = 'Ok'
output.append(output_record)
print('Successfully processed {} records.'.format(len(event['records'])))
return {'records': output}

Use this simple Python code.
input_str = '''{"personId":"p-111","itemId":"i-111"}{"personId":"p-222","itemId":"i-222"}{"personId":"p-333","itemId":"i-333"}'''
data_str = "[{}]".format(input_str.replace("}{","},{"))
data_json = json.loads(data_str)
And then (if you want) convert to Pandas.
import pandas as pd
df = pd.DataFrame().from_records(data_json)
print(df)
And this is result
itemId personId
0 i-111 p-111
1 i-222 p-222
2 i-333 p-333

If there's a way to change the way data is written, please separate all the records by a line. That way you can read the data simply, line by line. If not, then simply build a scanner object which takes "}" as a delimiter and use the scanner to read. That would do the job.

You can find the each valid JSON by counting the brackets. Assuming the file starts with a { this python snippet should work:
import json
def read_block(stream):
open_brackets = 0
block = ''
while True:
c = stream.read(1)
if not c:
break
if c == '{':
open_brackets += 1
elif c == '}':
open_brackets -= 1
block += c
if open_brackets == 0:
yield block
block = ''
if __name__ == "__main__":
c = 0
with open('firehose_json_blob', 'r') as f:
for block in read_block(f):
record = json.loads(block)
print(record)

This problem can be solved with a JSON parser that consumes objects one at a time from a stream. The raw_decode method of the JSONDecoder exposes just such a parser, but I've written a library that makes it straightforward to do this with a one-liner.
from firehose_sipper import sip
for entry in sip(bucket=..., key=...):
do_something_with(entry)
I've added some more details in this blog post

In Spark, we had the same problem. We're using the following:
from pyspark.sql.functions import *
#udf
def concatenated_json_to_array(text):
final = "["
separator = ""
for part in text.split("}{"):
final += separator + part
separator = "}{" if re.search(r':\s*"([^"]|(\\"))*$', final) else "},{"
return final + "]"
def read_concatenated_json(path, schema):
return (spark.read
.option("lineSep", None)
.text(path)
.withColumn("value", concatenated_json_to_array("value"))
.withColumn("value", from_json("value", schema))
.withColumn("value", explode("value"))
.select("value.*"))
It works as follows:
Read the data as one string per file (no delimiters!)
Use a UDF to introduce the JSON array and split the JSON objects by introducing a comma. Note: be careful not to break any strings with }{ in them!
Parse the JSON with a schema into DataFrame fields.
Explode the array into separate rows
Expand the value object into column.
Use it like this:
from pyspark.sql.types import *
schema = ArrayType(
StructType([
StructField("type", StringType(), True),
StructField("value", StructType([
StructField("id", IntegerType(), True),
StructField("joke", StringType(), True),
StructField("categories", ArrayType(StringType()), True)
]), True)
])
)
path = '/mnt/my_bucket_name/messages/*/*/*/*/'
df = read_concatenated_json(path, schema)
I've written more details and considerations here: Parsing JSON data from S3 (Kinesis) with Spark. Do not just split by }{, as it can mess up your string data! For example: { "line": "a\"r}{t" }.

You can use below script.
If streamed data size is not over buffer size that you set, each file of s3 have one pair of brackets([]) and comma.
import base64
print('Loading function')
def lambda_handler(event, context):
output = []
for record in event['records']:
print(record['recordId'])
payload = base64.b64decode(record['data']).decode('utf-8')+',\n'
# Do custom processing on the payload here
output_record = {
'recordId': record['recordId'],
'result': 'Ok',
'data': base64.b64encode(payload.encode('utf-8'))
}
output.append(output_record)
last = len(event['records'])-1
print('Successfully processed {} records.'.format(len(event['records'])))
start = '['+base64.b64decode(output[0]['data']).decode('utf-8')
end = base64.b64decode(output[last]['data']).decode('utf-8')+']'
output[0]['data'] = base64.b64encode(start.encode('utf-8'))
output[last]['data'] = base64.b64encode(end.encode('utf-8'))
return {'records': output}

Using JavaScript Regex.
JSON.parse(`[${item.replace(/}\s*{/g, '},{')}]`);

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Convert a Bytes String to Dictionary in Python - json

Related

Convert csv to json with django admin actions

Why does Pycryptodome MAC check fail when encrypting and decrypting JSON files?

keep getting error message when trying to read json from s3

How to work with JSON in python

Reading the data written to s3 by Amazon Kinesis Firehose stream

Categories

Resources