Parse from log file in python - json

I have a log file with arbitrary number of lines and json strings. All I need is to extract is one json data from the log file BUT ONLY AFTER '_____GP D_____'. I do not want any other lines or json data from the file.
This is how my input file looks
INFO:modules.gp.helpers.parameter_getter:_____GP D_____
{'from_time': '2017-07-12 19:57', 'to_time': '2017-07-12 20:57', 'consig_number': 'dup1', 'text': 'r155', 'mobile': None, 'email': None}
ERROR:modules.common.actionexception:ActionError: [{'other': 'your request already crossed threshold time'}]
{'from_time': '2016-07-12 16:57', 'to_time': '2016-07-12 22:57', 'consig_number': 'dup2', 'text': 'r15', 'mobile': None, 'email': None}
how do i find the json string only after '_____GP D_____'?

You can read your file line by line until you encounter _____GP D_____ at the end of the line, and when you do pick up just the next line:
found_json = None
with open("input.log", "r") as f: # open your log file
for line in f: # read it line by line
if line.rstrip()[-14:] == "_____GP D_____": # if a line ends with our string...
found_json = next(f).rstrip() # grab the next line
break # stop reading of the file, nothing more of interest
Then you can do with your found_json whatever you want, including parsing it, printing it, etc.
UPDATE - If you want to continuously 'follow' your log file (akin to the tail -f command) you can open it in read mode and keep the file handle open while reading it line by line with a reasonable delay added between reads (that's largely how tail -f does it, too) - then you can use the same procedure to discover when your desired line occurs and capture the next line to process, send to some other process or do whatever you plan to do with it. Something like:
import time
capture = False # a flag to use to signal the capture of the next line
found_lines = [] # a list to store our found lines, just as an example
with open("input.log", "r") as f: # open the file for reading...
while True: # loop indefinitely
line = f.readline() # grab a line from the file
if line != '': # if there is some content on the current line...
if capture: # capture the current line
found_lines.append(line.rstrip()) # store the found line
# instead, you can do whatever you want with the captured line
# i.e. to print it: print("Found: {}".format(line.rstrip()))
capture = False # reset the capture flag
elif line.rstrip()[-14:] == "_____GP D_____": # if it ends in '_____GP D_____'..
capture = True # signal that the next line should be captured
else: # an empty buffer encountered, most probably EOF...
time.sleep(1) # ... let's wait for a second before attempting to read again...

import json
from ast import literal_eval
KEY_STRING = '''_____GP D_____'''
text = """INFO:modules.gp.helpers.parameter_getter:_____GP D_____
{'from_time': '2017-07-12 19:57', 'to_time': '2017-07-12 20:57', 'consig_number': 'dup1', 'text': 'r155', 'mobile': None, 'email': None}
ERROR:modules.common.actionexception:ActionError: [{'other': 'your request already crossed threshold time'}]
{'from_time': '2016-07-12 16:57', 'to_time': '2016-07-12 22:57', 'consig_number': 'dup2', 'text': 'r15', 'mobile': None, 'email': None}"""
lines = text.split("\n") # load log text into a list.
# for loading from log would be more like
# with open("/var/log/syslog.log", 'r') as f:
# lines = f.readlines()
# set "gate" flag to False
flag = False
for loop in lines:
line = loop.strip()
if flag: # "gate" opened
# depends how's the dictionary streamed to log
# you could use json.loads(line), but if it is not sent to log with json.dumps than you have pythonic dictinary and use
# literal_eval to load that dictionary to a variable
# .. a
target_json = literal_eval(line)
print json.dumps(target_json, indent=4)
if KEY_STRING in line:
flag = True # KEY_STRING found open "gate"
else:
flag = False # close "gate"
~
Output:
{
"consig_number": "dup1",
"text": "r155",
"email": null,
"mobile": null,
"to_time": "2017-07-12 20:57",
"from_time": "2017-07-12 19:57"
}

Related

Save json list in a text file

I got a json log file that i rearrange to be correct, after this i am trying to save the results to the same file. The results are a list. but the problem that i am unable to save and will give me the following error:
write() argument must be str, not list
Here is the code it self:
import regex as re
import re
f_name = 'test1.txt'
splitter = r'"Event\d+":{(.*?)}' # a search pattern to capture the stuff in braces
#Open the file as Read.
with open(f_name, 'r') as src:
data = src.readlines()
# tokenize the data source...
tokens = re.findall(splitter, str(data))
#print(tokens)
# now we can operate on the tokens and split them up into key-value pairs and put them into a list
result = []
for token in tokens:
# make an empty dictionary to hold the row elements
line_dict = {}
# we can split the line (token) by comma to get the key-value pairs
pairs = token.split(',')
for pair in pairs:
# another regex split needed here, because the timestamps have colons too
splitter = r'"(.*)"\s*:\s*"(.*)"' # capture two groups of things in quotes on opposite sides of colon
parts = re.search(splitter, pair)
key, value = parts.group(1), parts.group(2)
line_dict[key] = value
# add the dictionary of line elements to the result
result.append(line_dict)
with open(f_name, 'w') as src:
for line in result:
src.write(result)
i.e the code it self was not written by me -> Log file management with python (thanks AirSquid)
Thanks for the assistance, New at Python here.
Tried to import json and use json.dump, also tried to append the text, but in most cases i end up with just [] or empty file.

json errors when appending data with Python

Good day.
I have a small password generator program and I want to save the created passwords into a json file (append each time) so I can add them to an SQLITE3 database.
Just trying to do the append functionality I receive several errors that I don't understand.
Here are the errors I receive and below that is the code itself.
I'm quite new to Python so additional details are welcomed.
Traceback (most recent call last):
File "C:\Users\whitmech\OneDrive - Six Continents Hotels, Inc\04 - Python\02_Mosh_Python_Course\Py_Projects\PWGenerator.py", line 32, in
data = json.load(file)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1264.0_x64__qbz5n2kfra8p0\lib\json_init_.py", line 293, in load
return loads(fp.read(),
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1264.0_x64__qbz5n2kfra8p0\lib\json_init_.py", line 346, in loads
return _default_decoder.decode(s)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1264.0_x64__qbz5n2kfra8p0\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.1264.0_x64__qbz5n2kfra8p0\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
import random
import string
import sqlite3
import json
from pathlib import Path
print('hello, Welcome to Password generator!')
# input the length of password
length = int(input('\nEnter the length of password: '))
# define data
lower = string.ascii_lowercase
upper = string.ascii_uppercase
num = string.digits
symbols = string.punctuation
# string.ascii_letters
# combine the data
all = lower + upper + num + symbols
# use random
temp = random.sample(all, length)
# create the password
password = "".join(temp)
filename = 'saved.json'
entry = {password}
with open(filename, "r+") as file:
data = json.load(file)
data.append(entry)
file.seek(0)
json.dump(data, file)
# print the password
print(password)
Update: I've changed the JSON code as directed and it works but when trying to do the SQLite3 code I'm knowing receiving a typeerror
Code:
with open(filename, "r+") as file:
try:
data = json.load(file)
data.append(entry)
except json.decoder.JSONDecodeError as e:
data = entry
file.seek(0)
json.dump(data, file)
# print the password
print(password)
store = input('Would you like to store the password? ')
if store == "Yes":
pwStored = json.loads(Path("saved.json").read_text())
with sqlite3.connect("db.pws") as conn:
command = "INSERT INTO Passwords VALUES (?)"
for i in pwStored:
conn.execute(command, tuple(i.values)) # Error with this code
conn.commit()
else:
exit()
Error:
AttributeError: 'str' object has no attribute 'values'
The error is because
Your json file is empty, you need to update the following block
entry = [password]
with open(filename, "r+") as file:
try:
data = json.load(file)
data.extend(entry)
except json.decoder.JSONDecodeError as e:
data = entry
file.seek(0)
json.dump(data, file)
Also you are adding password in a set ie., entry, and it will again throw you an error TypeError: Object of type set is not JSON serializable
So you need to convert that to either a list or dict
Note: Here I have used entry as a list

tinydb:Empty query was evaluated

One of the things this below code does is put different student IDs in tiny database after checking if the new ID is already present or not.
Code's below -
#enroll.py
# USAGE
# python enroll.py --id S1901 --name somename --conf config/config.json
# import the necessary packages
from pyimagesearch.utils import Conf
from imutils.video import VideoStream
from tinydb import TinyDB
from tinydb import where
import face_recognition
import argparse
import imutils
import pyttsx3
import time
import cv2
import os
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--id", required=True,
help="Unique student ID of the student")
ap.add_argument("-n", "--name", required=True,
help="Name of the student")
ap.add_argument("-c", "--conf", required=True,
help="Path to the input configuration file")
args = vars(ap.parse_args())
# load the configuration file
conf = Conf(args["conf"])
# initialize the database and student table objects
db = TinyDB(conf["db_path"])
studentTable = db.table("student")
# retrieve student details from the database
student = studentTable.search(where(args["id"]))
# check if an entry for the student id does *not* exist, if so, then
# enroll the student
if len(student) == 0:
# initialize the video stream and allow the camera sensor to warmup
print("[INFO] warming up camera...")
vs = VideoStream(src=0).start()
time.sleep(2.0)
# initialize the number of face detections and the total number
# of images saved to disk
faceCount = 0
total = 0
# ask the student to stand in front of the camera
print("{} please stand in front of the camera until you" \
"receive further instructions".format(args["name"]))
# initialize the status as detecting
status = "detecting"
# create the directory to store the student's data
os.makedirs(os.path.join(conf["dataset_path"], conf["class"],
args["id"]), exist_ok=True)
# loop over the frames from the video stream
while True:
# grab the frame from the threaded video stream, resize it (so
# face detection will run faster), flip it horizontally, and
# finally clone the frame (just in case we want to write the
# frame to disk later)
frame = vs.read()
frame = imutils.resize(frame, width=400)
frame = cv2.flip(frame, 1)
orig = frame.copy()
# convert the frame from from RGB (OpenCV ordering) to dlib
# ordering (RGB) and detect the (x, y)-coordinates of the
# bounding boxes corresponding to each face in the input image
rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
boxes = face_recognition.face_locations(rgb,
model=conf["detection_method"])
# loop over the face detections
for (top, right, bottom, left) in boxes:
# draw the face detections on the frame
cv2.rectangle(frame, (left, top), (right, bottom),
(0, 255, 0), 2)
# check if the total number of face detections are less
# than the threshold, if so, then skip the iteration
if faceCount < conf["n_face_detection"]:
# increment the detected face count and set the
# status as detecting face
faceCount += 1
status = "detecting"
continue
# save the frame to correct path and increment the total
# number of images saved
p = os.path.join(conf["dataset_path"], conf["class"],
args["id"], "{}.png".format(str(total).zfill(5)))
cv2.imwrite(p, orig[top:bottom, left:right])
total += 1
# set the status as saving frame
status = "saving"
# draw the status on to the frame
cv2.putText(frame, "Status: {}".format(status), (10, 20),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)
# show the output frame
cv2.imshow("Frame", frame)
cv2.waitKey(1)
# if the required number of faces are saved then break out from
# the loop
if total == conf["face_count"]:
# let the student know that face enrolling is over
print("Thank you {} you are now enrolled in the {} " \
"class.".format(args["name"], conf["class"]))
break
# insert the student details into the database
studentTable.insert({args["id"]: [args["name"], "enrolled"]})
# print the total faces saved and do a bit of cleanup
print("[INFO] {} face images stored".format(total))
print("[INFO] cleaning up...")
cv2.destroyAllWindows()
vs.stop()
# otherwise, a entry for the student id exists
else:
# get the name of the student
name = student[0][args["id"]][0]
print("[INFO] {} has already already been enrolled...".format(
name))
# close the database
db.close()
ISSUE:
While i run this code for the 1st time, everything works fine.
>> python3 enroll.py --id S1111 --name thor --conf config/config.json
I get my ID in my json file as shown below -
{"student": {"1": {"S1111": ["thor", "enrolled"]}}}
But when i try to put another ID -
python3 enroll.py --id S1112 --name hulk --conf config/config.json
I get the following error -
ERROR:
Traceback (most recent call last):
File "enroll.py", line 35, in <module>
student = studentTable.search(where(args["id"]))
File "/usr/lib/python3.5/site-packages/tinydb/table.py", line 222, in search
docs = [doc for doc in self if cond(doc)]
File "/usr/lib/python3.5/site-packages/tinydb/table.py", line 222, in <listcomp>
docs = [doc for doc in self if cond(doc)]
File "/usr/lib/python3.5/site-packages/tinydb/queries.py", line 59, in __call__
return self._test(value)
File "/usr/lib/python3.5/site-packages/tinydb/queries.py", line 136, in notest
raise RuntimeError('Empty query was evaluated')
RuntimeError: Empty query was evaluated
If i change my table name from student to something else then again it will store id only for the first time then gives the same error. I'm not sure what's wrong here.

How to use mock_open with json.load()?

I'm trying to get a unit test working that validates a function that reads credentials from a JSON-encoded file. Since the credentials themselves aren't fixed, the unit test needs to provide some and then test that they are correctly retrieved.
Here is the credentials function:
def read_credentials():
basedir = os.path.dirname(__file__)
with open(os.path.join(basedir, "authentication.json")) as f:
data = json.load(f)
return data["bot_name"], data["bot_password"]
and here is the test:
def test_credentials(self):
with patch("builtins.open", mock_open(
read_data='{"bot_name": "name", "bot_password": "password"}\n'
)):
name, password = shared.read_credentials()
self.assertEqual(name, "name")
self.assertEqual(password, "password")
However, when I run the test, the json code blows up with a decode error. Looking at the json code itself, I'm struggling to see why the mock test is failing because json.load(f) simply calls f.read() then calls json.loads().
Indeed, if I change my authentication function to the following, the unit test works:
def read_credentials():
# Read the authentication file from the current directory and create a
# HTTPBasicAuth object that can then be used for future calls.
basedir = os.path.dirname(__file__)
with open(os.path.join(basedir, "authentication.json")) as f:
content = f.read()
data = json.loads(content)
return data["bot_name"], data["bot_password"]
I don't necessarily mind leaving my code in this form, but I'd like to understand if I've got something wrong in my test that would allow me to keep my function in its original form.
Stack trace:
Traceback (most recent call last):
File "test_shared.py", line 56, in test_credentials
shared.read_credentials()
File "shared.py", line 60, in read_credentials
data = json.loads(content)
File "/home/philip/.local/share/virtualenvs/atlassian-webhook-basic-3gOncDp4/lib/python3.6/site-packages/flask/json/__init__.py", line 205, in loads
return _json.loads(s, **kwargs)
File "/usr/lib/python3.6/json/__init__.py", line 367, in loads
return cls(**kw).decode(s)
File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I had the same issue and got around it by mocking json.load and builtins.open:
import json
from unittest.mock import patch, MagicMock
# I don't care about the actual open
p1 = patch( "builtins.open", MagicMock() )
m = MagicMock( side_effect = [ { "foo": "bar" } ] )
p2 = patch( "json.load", m )
with p1 as p_open:
with p2 as p_json_load:
f = open( "filename" )
print( json.load( f ) )
Result:
{'foo': 'bar'}
I had the exact same issue and solved it. Full code below, first the function to test, then the test itself.
The original function I want to test loads a json file that is structured like a dictionary, and checks to see if there's a specific key-value pair in it:
def check_if_file_has_real_data(filepath):
with open(filepath, "r") as f:
data = json.load(f)
if "fake" in data["the_data"]:
return False
else:
return True
But I want to test this without loading any actual file, exactly as you describe. Here's how I solved it:
from my_module import check_if_file_has_real_data
import mock
#mock.patch("my_module.json.load")
#mock.patch("my_module.open")
def test_check_if_file_has_real_data(mock_open, mock_json_load):
mock_json_load.return_value = dict({"the_data": "This is fake data"})
assert check_if_file_has_real_data("filepath") == False
mock_json_load.return_value = dict({"the_data": "This is real data"})
assert check_if_file_has_real_data("filepath") == True
The mock_open object isn't called explicitly in the test function, but if you don't include that decorator and argument you get a filepath error when the with open part of the check_if_file_has_real_data function tries to run using the actual open function rather than the MagicMock object that's been passed into it.
Then you overwrite the response provided by the json.load mock with whatever you want to test.

Serialize in JSON a base64 encoded data

I'm writing a script to automate data generation for a demo and I need to serialize in a JSON some data. Part of this data is an image, so I encoded it in base64, but when I try to run my script I get:
Traceback (most recent call last):
File "lazyAutomationScript.py", line 113, in <module>
json.dump(out_dict, outfile)
File "/usr/lib/python3.4/json/__init__.py", line 178, in dump
for chunk in iterable:
File "/usr/lib/python3.4/json/encoder.py", line 422, in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
File "/usr/lib/python3.4/json/encoder.py", line 396, in _iterencode_dict
yield from chunks
File "/usr/lib/python3.4/json/encoder.py", line 396, in _iterencode_dict
yield from chunks
File "/usr/lib/python3.4/json/encoder.py", line 429, in _iterencode
o = _default(o)
File "/usr/lib/python3.4/json/encoder.py", line 173, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: b'iVBORw0KGgoAAAANSUhEUgAADWcAABRACAYAAABf7ZytAAAABGdB...
...
BF2jhLaJNmRwAAAAAElFTkSuQmCC' is not JSON serializable
As far as I know, a base64-encoded-whatever (a PNG image, in this case) is just a string, so it should pose to problem to serializating. What am I missing?
You must be careful about the datatypes.
If you read a binary image, you get bytes.
If you encode these bytes in base64, you get ... bytes again! (see documentation on b64encode)
json can't handle raw bytes, that's why you get the error.
I have just written some example, with comments, I hope it helps:
from base64 import b64encode
from json import dumps
ENCODING = 'utf-8'
IMAGE_NAME = 'spam.jpg'
JSON_NAME = 'output.json'
# first: reading the binary stuff
# note the 'rb' flag
# result: bytes
with open(IMAGE_NAME, 'rb') as open_file:
byte_content = open_file.read()
# second: base64 encode read data
# result: bytes (again)
base64_bytes = b64encode(byte_content)
# third: decode these bytes to text
# result: string (in utf-8)
base64_string = base64_bytes.decode(ENCODING)
# optional: doing stuff with the data
# result here: some dict
raw_data = {IMAGE_NAME: base64_string}
# now: encoding the data to json
# result: string
json_data = dumps(raw_data, indent=2)
# finally: writing the json string to disk
# note the 'w' flag, no 'b' needed as we deal with text here
with open(JSON_NAME, 'w') as another_open_file:
another_open_file.write(json_data)
Alternative solution would be encoding stuff on the fly with a custom encoder:
import json
from base64 import b64encode
class Base64Encoder(json.JSONEncoder):
# pylint: disable=method-hidden
def default(self, o):
if isinstance(o, bytes):
return b64encode(o).decode()
return json.JSONEncoder.default(self, o)
Having that defined you can do:
m = {'key': b'\x9c\x13\xff\x00'}
json.dumps(m, cls=Base64Encoder)
It will produce:
'{"key": "nBP/AA=="}'
What am I missing?
The error is yelling that a binary is not JSON serializable.
from base64 import b64encode
# *binary representation* of the base64 string
assert b64encode(b"binary content") == b'YmluYXJ5IGNvbnRlbnQ='
# base64 string
assert b64encode(b"binary content").decode('utf-8') == 'YmluYXJ5IGNvbnRlbnQ='
The latter is definitely "JSON serializable" because is the base64 string representation of the binary b"binary content".