How to work with JSON in python - json

My Python Script gives a JSON output. How can I see it in the proper JSON format?
I tried with parsing with json.dumps() and json.loads(), but could not achieve the desired result.
======= Myscript.py ========
import sys
import jenkins
import json
import credentials
# Credentails
username = credentials.login['username']
password = credentials.login['password']
# Print the number of jobs present in jenkins
server = jenkins.Jenkins('http://localhost:8080', username=username, password=password)
# Get the installed Plugin info
plugins = server.get_plugins_info()
#parsed = json.loads(plugins) # take a string as input and returns a dictionary as output.
parsed = json.dumps(plugins) # take a dictionary as input and returns a string as output.
#print(json.dumps(parsed, indent=4, sort_keys=True))
print(plugins)
print(parsed)

It sounds like you want to pretty-print your JSON. You would need to pass the correct parameters to json.dumps():
parsed = json.dumps(plugins, sort_keys=True, indent=4)
Check and see if that is what you are looking for.

Related

Convert a Bytes String to Dictionary in Python

Basic Information
I am creating a python script that can encrypt and decrypt a file with previous session data.
The Problem
I am able to decrypt my file and read it using a key. This returns a bytes string which I can in turn convert to a string. However, this string needs to be converted to a dictionary, which I cannot do. Using ast, json and eval I have run into errors.
Bytes string
decrypted = fernet.decrypt(encrypted)
String
string = decrypted.decode("UTF-8").replace("'", '"')
If I use eval() or ast.literal_eval() I get the following error:
Then I tried using json.loads() and I get the following error:
The information blocked out on both images is to protect my SSH connections. In the first image it is giving me a SyntaxError at the last digit of my ip address.
The Function
The function that is responsible for this when called looks like this:
def FileDecryption():
with open('enc_key.key', 'rb') as filekey:
key = filekey.read()
filekey.close()
fernet = Fernet(key)
with open('saved_data.txt', 'rb') as enc_file:
encrypted = enc_file.read()
enc_file.close()
decrypted = fernet.decrypt(encrypted)
print(decrypted)
string = decrypted.decode("UTF-8").replace("'", '"')
data = f'{string}'
print(data)
#data = eval(data)
data = json.loads(data)
print(type(data))
for key in data:
#command_string = ["load", data[key][1], data[key][2], data[key][3], data[key][4]]
#SSH.CreateSSH(command_string)
print(key)
Any help would be appreciated. Thanks!
Your data seems like it was written incorrectly in the first place, but without a complete example hard to say.
Here's a complete example that round-trips a JSON-able data object.
# requirement:
# pip install cryptography
from cryptography.fernet import Fernet
import json
def encrypt(data, data_filename, key_filename):
key = Fernet.generate_key()
with open(key_filename, 'wb') as file:
file.write(key)
fernet = Fernet(key)
encrypted = fernet.encrypt(json.dumps(data).encode())
with open(data_filename, 'wb') as file:
file.write(encrypted)
def decrypt(data_filename, key_filename):
with open(key_filename, 'rb') as file:
key = file.read()
fernet = Fernet(key)
with open(data_filename, 'rb') as file:
return json.loads(fernet.decrypt(file.read()))
data = {'key1': 'value1', 'key2': 'value2'}
encrypt(data, 'saved_data.txt', 'enc_key.key')
decrypted = decrypt('saved_data.txt', 'enc_key.key')
print(decrypted)
Output:
{'key1': 'value1', 'key2': 'value2'}

How to parse 2 json files in Apache beam

I have 2 json configuration files to read and want to assign there values to variables. I am creating a data flow job using apache beam but unable to parse those files and assign there values to a variable.
config1.json - { "bucket_name": "mybucket"}
config2.json - { "dataset_name": "mydataset"}
This is the pipeline statements ---- I tried with one JSON file first but even that is not working
with beam.Pipeline(options=pipeline_options) as pipeline:
steps = (pipeline
| "Getdata" >> beam.io.ReadFromText(custom_options.configfile)
| "CUSTOM JSON PARSE" >> beam.ParDo(custom_json_parser(custom_options.configfile))
| "write to GCS" >> beam.io.WriteToText('gs://mynewbucket/outputfile.txt')
)
result = pipeline.run()
result.wait_until_finish()
I also tried creating a function to parse atleast one file. This is a sample method I created but it did not work.
class custom_json_parser(beam.DoFn):
import apache_beam as beam
from apache_beam.io.gcp import gcsio
import logging
def __init__(self, configfile):
self.configfile = configfile
def process(self, configfile):
logging.info("JSON PARSING STARTED")
with beam.io.gcp.gcsio.GcsIO().open(self.configfile, 'r') as f:
for line in f:
data = json.loads(line)
bucket = data.get('bucket_name')
dataset = data.get('dataset_name') ```
Can someone please suggest the best method to resolve this issue in apache beam?
Thanks in Advance
If you need to read only once your files in the pipeline, don't read them in the pipeline, but before running it.
Read the files from GCS
Parse the file and put the useful content in the pipeline options map
Run your pipeline and use the data from the options
EDIT 1
You can use this piece of code to load the file and read it, before your pipeline. Simple Python, standard GCS libraries.
from google.cloud import storage
import json
client = storage.Client()
bucket = client.get_bucket('your-bucket')
blob = bucket.get_blob("name.json")
json_data = blob.download_as_string().decode('UTF-8')
print(json_data) # print -> {"name": "works!!"}
print(json.loads(json_data)["name"]) # print -> works!!
You can try following code snippet: -
Function to Parse File
class custom_json_parser(beam.DoFn):
def process(self, element):
logging.info(element)
data = json.loads(element)
bucket = data.get('bucket_name')
dataset = data.get('dataset_name')
return [{"bucket": bucket , "dataset": dataset }]
Over Pipeline you can call function
with beam.Pipeline(options=pipeline_options) as pipeline:
steps = (pipeline
| "Getdata" >> beam.io.ReadFromText(custom_options.configfile)
| "CUSTOM JSON PARSE" >> beam.ParDo(custom_json_parser())
| "write to GCS" >> beam.io.WriteToText('gs://mynewbucket/outputfile.txt')
)
result = pipeline.run()
result.wait_until_finish()
It will work.

Convert text based key/value pair to JSON format

I have a text file with a lot of key/value pairs in the given format:
secret_key="XXXXX"
database_password="1234"
timout=30
.
.
.
and list continues...
I want these key/value pairs to be stored in a JSON format so that I can make use of this data in the JSON format. Is there any way of doing this. I mean any website or any method to do it automatically?
The Python 3.8 script below would do the job ◡̈
import json
with open('text', 'r') as fp:
dic = {}
while line:=fp.readline().strip():
key, value = line.split('=')
dic[key] = eval(value)
print(json.dumps(dic))
Note: eval is used to prevent double quotes being escaped.
As I guess that is an .env file. So, I would suggest you try to implement something like this in Python:
import json
import sys
try:
dotenv = sys.argv[1]
except IndexError as e:
dotenv = '.env'
with open(dotenv, 'r') as f:
content = f.readlines()
# removes whitespace chars like '\n' at the end of each line
content = [x.strip().split('=') for x in content if '=' in x]
print(json.dumps(dict(content)))
Reference: https://gist.github.com/GabLeRoux/d6b2c2f7a69ebcd8430ea59c9bcc62c0
*Please let me know if you want to implement it in a different language, such as JavaScript.

Import Kaggle csv from download url to pandas DataFrame

I've been trying different methods to import the SpaceX missions csv file on Kaggle directly into a pandas DataFrame, without any success.
I'd need to send requests to login. This is what I have so far:
import requests
import pandas as pd
from io import StringIO
# Link to the Kaggle data set & name of zip file
login_url = 'http://www.kaggle.com/account/login?ReturnUrl=/spacex/spacex-missions/downloads/database.csv'
# Kaggle Username and Password
kaggle_info = {'UserName': "user", 'Password': "pwd"}
# Login to Kaggle and retrieve the data.
r = requests.post(login_url, data=kaggle_info, stream=True)
df = pd.read_csv(StringIO(r.text))
r is returning the html content of the page.
df = pd.read_csv(url) gives a CParser error:
CParserError: Error tokenizing data. C error: Expected 1 fields in line 13, saw 6
I've searched for a solution, but so far nothing I've tried worked.
You are creating a stream and passing it directly to pandas. I think you need to pass a file like object to pandas. Take a look at this answer for a possible solution (using post and not get in the request though).
Also i think the login url with redirect that you use is not working as it is. I know i suggested that here. But i ended up not using is because the post request call did not handle the redirect (i suspect).
The code i ended up using in my project was this:
def from_kaggle(data_sets, competition):
"""Fetches data from Kaggle
Parameters
----------
data_sets : (array)
list of dataset filenames on kaggle. (e.g. train.csv.zip)
competition : (string)
name of kaggle competition as it appears in url
(e.g. 'rossmann-store-sales')
"""
kaggle_dataset_url = "https://www.kaggle.com/c/{}/download/".format(competition)
KAGGLE_INFO = {'UserName': config.kaggle_username,
'Password': config.kaggle_password}
for data_set in data_sets:
data_url = path.join(kaggle_dataset_url, data_set)
data_output = path.join(config.raw_data_dir, data_set)
# Attempts to download the CSV file. Gets rejected because we are not logged in.
r = requests.get(data_url)
# Login to Kaggle and retrieve the data.
r = requests.post(r.url, data=KAGGLE_INFO, stream=True)
# Writes the data to a local file one chunk at a time.
with open(data_output, 'wb') as f:
# Reads 512KB at a time into memory
for chunk in r.iter_content(chunk_size=(512 * 1024)):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
Example use:
sets = ['train.csv.zip',
'test.csv.zip',
'store.csv.zip',
'sample_submission.csv.zip',]
from_kaggle(sets, 'rossmann-store-sales')
You might need to unzip the files.
def _unzip_folder(destination):
"""Unzip without regards to the folder structure.
Parameters
----------
destination : (str)
Local path and filename where file is should be stored.
"""
with zipfile.ZipFile(destination, "r") as z:
z.extractall(config.raw_data_dir)
So i never really directly loaded it into the DataFrame, but rather stored it to disk first. But you could modify it to use a temp directory and just delete the files after you read them.

How do I grab info from this json file?

I'm trying to grab some numbers from this json file, but I don't how to do it correctly. This is the json file I am trying to gather information from:
http://stats.nba.com/stats/leaguedashteamstats?Conference=&DateFrom=&DateTo=&Division=&GameScope=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2016-17&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=
I've been trying to get this code to work, but I can't figure it out:
import json
from pprint import pprint
with open('data.json') as data_file:
data = json.load(data_file)
data["rowSet"] ["1610612737"] ["Atlanta Hawks"]
I'm trying to get the statistics from each team.
The following Python script should do it.
#!/usr/bin/env python
import json
with open('leaguedashteamstats.json') as data_file:
data = json.load(data_file)
# extract headers names
headers = data['resultSets'][0]['headers']
# extract raw json rows
raw_rows = data['resultSets'][0]['rowSet']
team_stats = []
for row in raw_rows:
print row[1] # prints team name
# mixes header names and values and prints them out
for (header, value) in zip(headers, row):
print header, value
print '\n'
Both data and code can be seen here:
https://gist.github.com/cevaris/24d0b7d97677667aedb14059a6959da1#file-1-team-stats-output
Disclaimer: this code doesn't contain any validation, but it should lead you in the right direction:
import json
with open('data.json') as data_file:
data = json.load(data_file)
for rs in data.get('resultSets'):
for r_ in [r for r in rs.get('rowSet') if r[1] == 'Atlanta Hawks']:
print(r_)
You basically need to determine specific keys that you are going to loop through, or obtain.
This should hopefully get you to where you need to be.