Convert a Bytes String to Dictionary in Python - json

Basic Information
I am creating a python script that can encrypt and decrypt a file with previous session data.
The Problem
I am able to decrypt my file and read it using a key. This returns a bytes string which I can in turn convert to a string. However, this string needs to be converted to a dictionary, which I cannot do. Using ast, json and eval I have run into errors.
Bytes string
decrypted = fernet.decrypt(encrypted)
String
string = decrypted.decode("UTF-8").replace("'", '"')
If I use eval() or ast.literal_eval() I get the following error:
Then I tried using json.loads() and I get the following error:
The information blocked out on both images is to protect my SSH connections. In the first image it is giving me a SyntaxError at the last digit of my ip address.
The Function
The function that is responsible for this when called looks like this:
def FileDecryption():
with open('enc_key.key', 'rb') as filekey:
key = filekey.read()
filekey.close()
fernet = Fernet(key)
with open('saved_data.txt', 'rb') as enc_file:
encrypted = enc_file.read()
enc_file.close()
decrypted = fernet.decrypt(encrypted)
print(decrypted)
string = decrypted.decode("UTF-8").replace("'", '"')
data = f'{string}'
print(data)
#data = eval(data)
data = json.loads(data)
print(type(data))
for key in data:
#command_string = ["load", data[key][1], data[key][2], data[key][3], data[key][4]]
#SSH.CreateSSH(command_string)
print(key)
Any help would be appreciated. Thanks!

Your data seems like it was written incorrectly in the first place, but without a complete example hard to say.
Here's a complete example that round-trips a JSON-able data object.
# requirement:
# pip install cryptography
from cryptography.fernet import Fernet
import json
def encrypt(data, data_filename, key_filename):
key = Fernet.generate_key()
with open(key_filename, 'wb') as file:
file.write(key)
fernet = Fernet(key)
encrypted = fernet.encrypt(json.dumps(data).encode())
with open(data_filename, 'wb') as file:
file.write(encrypted)
def decrypt(data_filename, key_filename):
with open(key_filename, 'rb') as file:
key = file.read()
fernet = Fernet(key)
with open(data_filename, 'rb') as file:
return json.loads(fernet.decrypt(file.read()))
data = {'key1': 'value1', 'key2': 'value2'}
encrypt(data, 'saved_data.txt', 'enc_key.key')
decrypted = decrypt('saved_data.txt', 'enc_key.key')
print(decrypted)
Output:
{'key1': 'value1', 'key2': 'value2'}

Related

Convert csv to json with django admin actions

Register a csv in django-admin and through an action in django-admin to convert to json and store the value in a JSONField
However, in action django-admin I'm getting this error and I can't convert it to json...
admin.py
(....)
def read_data_csv(path):
with open(path, newline='') as csvfile:
reader = csv.DictReader(csvfile)
data = []
for row in reader:
data.append(dict(row))
return data
def convert(modeladmin, request, queryset):
for extraction in queryset:
csv_file_path = extraction.lawsuits
read_data_csv(csv_file_path)
Error:
TypeError at /admin/core/extraction/
expected str, bytes or os.PathLike object, not FieldFile
This is about extraction.lawsuits which is a FieldFile instance. Just pass read_data_csv function csv_file_path.path as argument. That should work.

Why does Pycryptodome MAC check fail when encrypting and decrypting JSON files?

I am trying to do encrypt some JSON data with AES-256, using a password hashed with pbkdf2_sha256 as the key. I want to store the data in a file, be able to load it up, decrypt it, alter it, encrypt it, store it, and repeat.
I am using the passlib and pycryptodome libraries with python 3.8. The following test occurs inside a docker container and throws an error I haven't been able to correct
Does anyone have any clues on how I can improve my code (and knowledge)?
Test.py:
import os, json
from Crypto.PublicKey import RSA
from Crypto.Cipher import AES
from passlib.hash import pbkdf2_sha256
def setJsonData(jsonData, jsonFileName):
with open(jsonFileName, 'wb') as jsonFile:
password = 'd'
key = pbkdf2_sha256.hash(password)[-16:]
data = json.dumps(jsonData).encode("utf8")
cipher = AES.new(key.encode("utf8"), AES.MODE_EAX)
ciphertext, tag = cipher.encrypt_and_digest(data)
[ jsonFile.write(x) for x in (cipher.nonce, tag, ciphertext) ]
def getJsonData(jsonFileName):
with open(jsonFileName, 'rb') as jsonFile:
password = 'd'
key = pbkdf2_sha256.hash(password)[-16:]
nonce, tag, ciphertext = [ jsonFile.read(x) for x in (16, 16, -1) ]
cipher = AES.new(key.encode("utf8"), AES.MODE_EAX, nonce)
data = cipher.decrypt_and_verify(ciphertext, tag)
return json.loads(data)
dictTest = {}
dictTest['test'] = 1
print(str(dictTest))
setJsonData(dictTest, "test")
dictTest = getJsonData("test")
print(str(dictTest))
Output:
{'test': 1}
Traceback (most recent call last):
File "test.py", line 37, in <module>
dictTest = getJsonData("test")
File "test.py", line 24, in getJsonData
data = cipher.decrypt_and_verify(ciphertext, tag)
File "/usr/local/lib/python3.8/site-packages/Crypto/Cipher/_mode_eax.py", line 368, in decrypt_and_verify
self.verify(received_mac_tag)
File "/usr/local/lib/python3.8/site-packages/Crypto/Cipher/_mode_eax.py", line 309, in verify
raise ValueError("MAC check failed")
ValueError: MAC check failed
Research:
Looked into this answer, but I believe my verify() call is in
the right place
I noted that in the python docs, it says:
loads(dumps(x)) != x if x has non-string keys.
but, when I re-run the test with dictTest['test'] = 'a' I have the same error.
I suspected the problem was the json formatting, so I did the same test with a string and didn't make the json.loads and json.dumps calls, but I have the same error
The problem here is that key = pbkdf2_sha256.hash(password)[-16:] hashes the key with a new salt each call. Therefore, the cipher used to encrypt and decrypt the cipher text is going to be different, yielding different data, and thus failing the integrity check.
I changed my key derivation function to the following:
h = SHA3_256.new()
h.update(password.encode("utf-8"))
key = h.digest()

keep getting error message when trying to read json from s3

I keep getting this error in my lambda function:
{"errorMessage": "module initialization error"}
This happens when i try to turn the following string containing json data into a json dictionary object within python.
"{\n\"main\": {\n \"PART_NAME\": \"Genuine Cardboard Honda Wing\",\n \"BRAND\": \"Honda\",\n \"MJR_CAT\": \"Aero\",\n \"CAT\": \"Rear Wing\",\n \"SUB_CAT\": \"NA\",\n \"Power_Increase\": \"0\"\n},\n\"forza\":\n{\n \"power\": \"[0, True]\",\n \"Torque\": \"[0, True]\",\n \"Traction\": \"[50, True]\",\n \"Handling\": \"[100, True]\",\n \"Breaking\": \"[40, True]\"\n},\n\"custom\": {\n\"length\": 120,\n\"car max height[m]\": 2,\n\"RICER RANK\": -10\n\n}\n"
Here is my code to replicate this error:
client = boto3.client('s3')
result = client.get_object(Bucket=BUCKET, Key=FILE_TO_READ)
text = result['Body'].read().decode('utf-8')
text = json.load(text)
print(text)
without the print(text) it produces that string above.
Thanks :)
Here is the full lambda function (though not commented) if you are interested.
import json
import boto3
print('got this far')
BUCKET = '******'
FILE_TO_READ = 'example_honda_wing.json'
client = boto3.client('s3')
result = client.get_object(Bucket=BUCKET, Key=FILE_TO_READ)
text = result['Body'].read().decode('utf-8')
#text = str(text).replace("\n","")
#text = text.replace('\"',' ')
#text = json.load(text)
print(text) # Use your desired JSON Key for your value
def lambda_handler(event, context):
# TODO implement
return text

How to work with JSON in python

My Python Script gives a JSON output. How can I see it in the proper JSON format?
I tried with parsing with json.dumps() and json.loads(), but could not achieve the desired result.
======= Myscript.py ========
import sys
import jenkins
import json
import credentials
# Credentails
username = credentials.login['username']
password = credentials.login['password']
# Print the number of jobs present in jenkins
server = jenkins.Jenkins('http://localhost:8080', username=username, password=password)
# Get the installed Plugin info
plugins = server.get_plugins_info()
#parsed = json.loads(plugins) # take a string as input and returns a dictionary as output.
parsed = json.dumps(plugins) # take a dictionary as input and returns a string as output.
#print(json.dumps(parsed, indent=4, sort_keys=True))
print(plugins)
print(parsed)
It sounds like you want to pretty-print your JSON. You would need to pass the correct parameters to json.dumps():
parsed = json.dumps(plugins, sort_keys=True, indent=4)
Check and see if that is what you are looking for.

Reading the data written to s3 by Amazon Kinesis Firehose stream

I am writing record to Kinesis Firehose stream that is eventually written to a S3 file by Amazon Kinesis Firehose.
My record object looks like
ItemPurchase {
String personId,
String itemId
}
The data is written to S3 looks like:
{"personId":"p-111","itemId":"i-111"}{"personId":"p-222","itemId":"i-222"}{"personId":"p-333","itemId":"i-333"}
NO COMMA SEPERATION.
NO STARTING BRACKET as in a Json Array
[
NO ENDING BRACKET as in a Json Array
]
I want to read this data get a list of ItemPurchase objects.
List<ItemPurchase> purchases = getPurchasesFromS3(IOUtils.toString(s3ObjectContent))
What is the correct way to read this data?
It boggles my mind that Amazon Firehose dumps JSON messages to S3 in this manner, and doesn't allow you to set a delimiter or anything.
Ultimately, the trick I found to deal with the problem was to process the text file using the JSON raw_decode method
This will allow you to read a bunch of concatenated JSON records without any delimiters between them.
Python code:
import json
decoder = json.JSONDecoder()
with open('giant_kinesis_s3_text_file_with_concatenated_json_blobs.txt', 'r') as content_file:
content = content_file.read()
content_length = len(content)
decode_index = 0
while decode_index < content_length:
try:
obj, decode_index = decoder.raw_decode(content, decode_index)
print("File index:", decode_index)
print(obj)
except JSONDecodeError as e:
print("JSONDecodeError:", e)
# Scan forward and keep trying to decode
decode_index += 1
I also had the same problem, here is how I solved.
replace "}{" with "}\n{"
line split by "\n".
input_json_rdd.map(lambda x : re.sub("}{", "}\n{", x, flags=re.UNICODE))
.flatMap(lambda line: line.split("\n"))
A nested json object has several "}"s, so split line by "}" doesn't solve the problem.
I've had the same issue.
It would have been better if AWS allowed us to set a delimiter but we can do it on our own.
In my use case, I've been listening on a stream of tweets, and once receiving a new tweet I immediately put it to Firehose.
This, of course, resulted in a 1-line file which could not be parsed.
So, to solve this, I have concatenated the tweet's JSON with a \n.
This, in turn, let me use some packages that can output lines when reading stream contents, and parse the file easily.
Hope this helps you.
I think the best ways to tackle this is to first create a properly formatted json file containing well separated json objects within them. In my case I added ',' to the events which was pushed into the firehose. Then After a file is saved in s3, all the files will contain json object separated by some delimitter(comma- in our case). Another thing that must be added are '[' and ']' at the beginning and end of the file. Then you have a proper json file containing multiple json objects. Parsing them will be possible now.
If the input source for the firehose is an Analytics application, this concatenated JSON without a delimiter is a known issue as cited here. You should have a lambda function as here that outputs JSON objects in multiple lines.
I used a transformation Lambda to add a line break at the end of every record
def lambda_handler(event, context):
output = []
for record in event['records']:
# Decode from base64 (Firehose records are base64 encoded)
payload = base64.b64decode(record['data'])
# Read json as utf-8
json_string = payload.decode("utf-8")
# Add a line break
output_json_with_line_break = json_string + "\n"
# Encode the data
encoded_bytes = base64.b64encode(bytearray(output_json_with_line_break, 'utf-8'))
encoded_string = str(encoded_bytes, 'utf-8')
# Create a deep copy of the record and append to output with transformed data
output_record = copy.deepcopy(record)
output_record['data'] = encoded_string
output_record['result'] = 'Ok'
output.append(output_record)
print('Successfully processed {} records.'.format(len(event['records'])))
return {'records': output}
Use this simple Python code.
input_str = '''{"personId":"p-111","itemId":"i-111"}{"personId":"p-222","itemId":"i-222"}{"personId":"p-333","itemId":"i-333"}'''
data_str = "[{}]".format(input_str.replace("}{","},{"))
data_json = json.loads(data_str)
And then (if you want) convert to Pandas.
import pandas as pd
df = pd.DataFrame().from_records(data_json)
print(df)
And this is result
itemId personId
0 i-111 p-111
1 i-222 p-222
2 i-333 p-333
If there's a way to change the way data is written, please separate all the records by a line. That way you can read the data simply, line by line. If not, then simply build a scanner object which takes "}" as a delimiter and use the scanner to read. That would do the job.
You can find the each valid JSON by counting the brackets. Assuming the file starts with a { this python snippet should work:
import json
def read_block(stream):
open_brackets = 0
block = ''
while True:
c = stream.read(1)
if not c:
break
if c == '{':
open_brackets += 1
elif c == '}':
open_brackets -= 1
block += c
if open_brackets == 0:
yield block
block = ''
if __name__ == "__main__":
c = 0
with open('firehose_json_blob', 'r') as f:
for block in read_block(f):
record = json.loads(block)
print(record)
This problem can be solved with a JSON parser that consumes objects one at a time from a stream. The raw_decode method of the JSONDecoder exposes just such a parser, but I've written a library that makes it straightforward to do this with a one-liner.
from firehose_sipper import sip
for entry in sip(bucket=..., key=...):
do_something_with(entry)
I've added some more details in this blog post
In Spark, we had the same problem. We're using the following:
from pyspark.sql.functions import *
#udf
def concatenated_json_to_array(text):
final = "["
separator = ""
for part in text.split("}{"):
final += separator + part
separator = "}{" if re.search(r':\s*"([^"]|(\\"))*$', final) else "},{"
return final + "]"
def read_concatenated_json(path, schema):
return (spark.read
.option("lineSep", None)
.text(path)
.withColumn("value", concatenated_json_to_array("value"))
.withColumn("value", from_json("value", schema))
.withColumn("value", explode("value"))
.select("value.*"))
It works as follows:
Read the data as one string per file (no delimiters!)
Use a UDF to introduce the JSON array and split the JSON objects by introducing a comma. Note: be careful not to break any strings with }{ in them!
Parse the JSON with a schema into DataFrame fields.
Explode the array into separate rows
Expand the value object into column.
Use it like this:
from pyspark.sql.types import *
schema = ArrayType(
StructType([
StructField("type", StringType(), True),
StructField("value", StructType([
StructField("id", IntegerType(), True),
StructField("joke", StringType(), True),
StructField("categories", ArrayType(StringType()), True)
]), True)
])
)
path = '/mnt/my_bucket_name/messages/*/*/*/*/'
df = read_concatenated_json(path, schema)
I've written more details and considerations here: Parsing JSON data from S3 (Kinesis) with Spark. Do not just split by }{, as it can mess up your string data! For example: { "line": "a\"r}{t" }.
You can use below script.
If streamed data size is not over buffer size that you set, each file of s3 have one pair of brackets([]) and comma.
import base64
print('Loading function')
def lambda_handler(event, context):
output = []
for record in event['records']:
print(record['recordId'])
payload = base64.b64decode(record['data']).decode('utf-8')+',\n'
# Do custom processing on the payload here
output_record = {
'recordId': record['recordId'],
'result': 'Ok',
'data': base64.b64encode(payload.encode('utf-8'))
}
output.append(output_record)
last = len(event['records'])-1
print('Successfully processed {} records.'.format(len(event['records'])))
start = '['+base64.b64decode(output[0]['data']).decode('utf-8')
end = base64.b64decode(output[last]['data']).decode('utf-8')+']'
output[0]['data'] = base64.b64encode(start.encode('utf-8'))
output[last]['data'] = base64.b64encode(end.encode('utf-8'))
return {'records': output}
Using JavaScript Regex.
JSON.parse(`[${item.replace(/}\s*{/g, '},{')}]`);