Background:
I have a JSON dictionary file as follows:
dictionary = {"Qui": "クイ", "Quiana": "キアナ", "Quick": "クイック", "Quickley": "クイックリー", "Quico": "キコ", "Quiej-Alvarez": "クエイ アルバレス", "Quigg": "クイッグ", "Quigley": "クイグリー", "Quijano": "クイジャーノ", "Quik": "クイック", "Quilici": "クイリチ", "Quill": "クィル"}
Then I will let the user enter as many keys as they want through input, finally return formatted string combined with key.value.
Question:
My code so far gets the job done in a very clunky/incomplete manner. Any advice on how to clean up the code and achieve my goal?
Current code:
import json
import sys, math
import codecs
#Part1
search_term,search_term2 = input("Enter a Name: ").split()
dictionary = {}
keys = dictionary.keys()
values = dictionary.values()
with open ('translation.json', 'r', encoding='utf-8-sig') as f:
term_data = json.load(f)
if search_term.casefold() in term_data:
word = search_term.title()
elif search_term.title() in term_data:
word = search_term.title()
output1 = "{}".format(term_data[search_term])
#Part 2
with open ('translation.json', 'r', encoding='utf-8-sig') as f:
term_data2 = json.load(f)
if search_term2.casefold() in term_data2:
word2 = search_term2.title()
elif search_term2.title() in term_data2:
word2 = search_term2.title()
#else:
#print("Name not found in dictionary.")
output2 = "{}".format(term_data2[search_term2])
print("{}・{}".format(output1,output2))
Your current code can just enter 2 keys which cannot meet your original requirements, I expand as follows, meanwhile make it simpler:
test.py:
import json
import codecs
with open('translation.json', 'r', encoding='utf-8-sig') as f:
term_data = json.load(f)
search_terms = input("Enter a name: ").split()
l = [term_data[i] for i in search_terms if i.casefold() in term_data or i.title() in term_data]
print('.'.join(l))
First we just need to open json file once, it's expensive to do IO operation, we need to avoid to do it again and again.
Second, we needn't repeat term match as you do with Part1, Part2. We can do it in loop, here I use list comprehension.
Finally, explain a litte:
split all user inputs to a list: search_terms
loop a user input terms with for i in search_terms
if the candidate term i's casefold() or title() in dictionary term_data, it's value in dic was put to new list l again, if not do nothing.
at last, use the separator . to join all the needed elements of list.
Output:
~$ python3 test.py
Enter a name: Qui Quill Quiana
クイ.クィル.キアナ
Related
Sorry, I am new in coding in Python, I would need to save a json file generated in a for loop as csv for each iteration of the loop.
I wrote a code that works fine to generate the first csv file but then it is overwritten and I did not find a solution yet. Can anyone help me? many thanks
from twarc.client2 import Twarc2
import itertools
import pandas as pd
import csv
import json
import numpy as np
# Your bearer token here
t = Twarc2(bearer_token="AAAAAAAAAAAAAAAAAAAAA....WTW")
# Get a bunch of user handles you want to check:
list_of_names = np.loadtxt("usernames.txt",dtype="str")
# Get the `data` part of every request only, as one list
def get_data(results):
return list(itertools.chain(*[result['data'] for result in results]))
user_objects = get_data(t.user_lookup(users=list_of_names, usernames=True))
for user in user_objects:
following = get_data(t.following(user['id']))
# Do something with the lists
print(f"User: {user['username']} Follows {len(following)} -2")
json_string = json.dumps(following)
df = pd.read_json(json_string)
df.to_csv('output_file.csv')
You need to add a sequence number or some other unique identifier to the filename. The clearest example would be to keep track of a counter, or use a GUID. Below I've used a counter that is initialized before your loop, and is incremented in each iteration. This will produce a list of files like output_file_1.csv, output_file_2.csv, output_file_3.csv and so on.
counter = 0
for user in user_objects:
following = get_data(t.following(user['id']))
# Do something with the lists
print(f"User: {user['username']} Follows {len(following)} -2")
json_string = json.dumps(following)
df = pd.read_json(json_string)
df.to_csv('output_file_' + str(counter) + '.csv')
counter += 1
We convert the integer to a string, and paste it inbetween the name of your file and its extension.
from twarc.client2 import Twarc2
import itertools
import pandas as pd
import csv
import json
import numpy as np
# Your bearer token here
t = Twarc2(bearer_token="AAAAAAAAAAAAAAAAAAAAA....WTW")
# Get a bunch of user handles you want to check:
list_of_names = np.loadtxt("usernames.txt",dtype="str")
# Get the `data` part of every request only, as one list
def get_data(results):
return list(itertools.chain(*[result['data'] for result in results]))
user_objects = get_data(t.user_lookup(users=list_of_names, usernames=True))
for idx, user in enumerate(user_objects):
following = get_data(t.following(user['id']))
# Do something with the lists
print(f"User: {user['username']} Follows {len(following)} -2")
json_string = json.dumps(following)
df = pd.read_json(json_string)
df.to_csv(f'output_file{str(idx)}.csv')
For homework we have been asked to build a dictionary from CSV
CSV looks something like this:
David,5,6,10,12,15,20
Micheal,9,15,13,20,5,8
John,1,2,5,8,19,10
I want convert CSV to Python Dictionary,
But I don't know, how can i do that?
import csv
from statistics import mean
with open('grades.csv') as FileCsv:
reader = csv.reader(FileCsv)
for index in reader:
name = index[0]
these_grades = list()
for lines in index[1:]:
these_grades.append(int(lines))
print(mean(these_grades))
# Example
# average = dict()
# print('average['John'])
The output should be like that:
John's Mean = 7.5
Instead of creating a list (these_grades = list()), create the list in a dictionary indexed by name:
students = dict()
...
students[name] = list()
Then append grades to the list:
students[name].append(grade)
I am trying to convert a JSON file to CSV format using Python. I am using JSON.loads() function and then using json_normalize() to flatten the objects. I was wondering if there is better way of doing this.
this is the input file, one row form it:
{"ID": "02","Date": "2019-08-01","Total": 400,"QTY": 12,"Item": [{"NM": "0000000001","CD": "item_CD1","SRL": "25","Disc": [{"CD": "discount_CD1","Amount": 2}],"TxLns": {"TX": [{"TXNM": "000001-001","TXCD": "TX_CD1"}]}},{"NM": "0000000002","CD": "item_CD2","SRL": "26","Disc": [{"CD": "discount_CD2","Amount": 4}],"TxLns": {"TX": [{"TXNM": "000002-001","TXCD": "TX_CD2"}]}},{"NM": "0000000003","CD": "item_CD3","SRL": "27"}],"Cust": {"CustID": 10,"Email": "01#abc.com"},"Address": [{"FirstName": "firstname","LastName": "lastname","Address": "address"}]}
Code
import json
import pandas as pd
from pandas.io.json import json_normalize
data_final=pd.DataFrame()
with open("sample.json") as f:
for line in f:
json_obj = json.loads(line)
ID = json_obj['ID']
Item = json_obj['Item']
dataMain = json_normalize(json_obj)
dataMain=dataMain.drop(['Item','Address'], axis=1)
#dataMain.to_csv("main.csv",index=False)
dataItem = json_normalize(json_obj,'Item',['ID'])
dataItem=dataItem.drop(['Disc','TxLns.TX'],axis=1)
#dataItem.to_csv("Item.csv",index=False)
dataDisc = pd.DataFrame()
dataTx = pd.DataFrame()
for rt in Item:
NM=rt['NM']
rt['ID'] = ID
if 'Disc' in rt:
data = json_normalize(rt, 'Disc', ['NM','ID'])
dataDisc = dataDisc.append(data, sort=False)
if 'TxLns' in rt:
tx=rt['TxLns']
tx['NM'] = NM
tx['ID'] = ID
if 'TX' in tx:
data = json_normalize(tx, 'TX', ['NM','ID'])
dataTx = dataTx.append(data, sort=False)
dataDIS = pd.merge(dataItem, dataDisc, on=['NM','ID'],how='left')
dataTX = pd.merge(dataDIS, dataTx, on=['NM','ID'],how='left')
dataAddress = json_normalize(json_obj,'Address',['ID'])
data_IT = pd.merge(dataMain, dataTX, on=['ID'])
data_merge=pd.merge(data_IT,dataAddress, on=['ID'])
data_final=data_final.append(data_merge,sort=False)
data_final=data_final.drop_duplicates(keep = 'first')
data_final.to_csv("data_merged.csv",index=False)
this is the output:
ID,Date,Total,QTY,Cust.CustID,Cust.Email,NM,CD_x,SRL,CD_y,Amount,TXNM,TXCD,FirstName,LastName,Address
02,2019-08-01,400,12,10,01#abc.com,0000000001,item_CD1,25,discount_CD1,2.0,000001-001,TX_CD1,firstname,lastname,address
02,2019-08-01,400,12,10,01#abc.com,0000000002,item_CD2,26,discount_CD2,4.0,000002-001,TX_CD2,firstname,lastname,address
02,2019-08-01,400,12,10,01#abc.com,0000000003,item_CD3,27,,,,,firstname,lastname,address
The code is working fine for now. By Better I mean:
Is it efficient in terms of time and space complexity? If this code has to process around 10K records in a file, is this the optimized solution?
For Python3.
How would you approach the following problem? (I did not find anything like this in some other post)
I need to open/load 72 different .json files and assign each one of them to a variable. Like this:
import json,
with open('/Users/Data/netatmo_20171231_0000.json') as f:
d1 = json.load(f)
with open('/Users/Data/netatmo_20171231_0010.json') as f:
d2 = json.load(f)
with open('/Users/Data/netatmo_20171231_0020.json') as f:
d3 = json.load(f)
with open('/Users/Data/netatmo_20171231_0030.json') as f:
d4 = json.load(f)
with open('/Users/Data/netatmo_20171231_0040.json') as f:
d5 = json.load(f)
with open('/Users/Data/netatmo_20171231_0050.json') as f:
d6 = json.load(f)
with open('/Users/Data/netatmo_20171231_0100.json') as f:
d7 = json.load(f)
with open('/Users/Data/netatmo_20171231_0110.json') as f:
d8 = json.load(f)
with open('/Users/Data/netatmo_20171231_0120.json') as f:
d9 = json.load(f)
with open('/Users/Data/netatmo_20171231_0130.json') as f:
d10 = json.load(f)
But I don't want to (also think it is inefficient) perform this for 72 times.
At the end I will create a pandas dataframe, but first I need the json(s) in variables because I'm applying a function to them to flatten the data (these Jsons are very nested).
I also tried to join the JSON files successfully, but the resulting JSON is 5GB, and my PC takes 12 hours to load it. (So this is not an option)
Thanks, and kind regards.
First, find out where your bottlenecks are.
If it is on the json decoding/encoding step, try switching to ultrajson
I have not tested it but one way how you could improve is via multiple process.
import os
import pandas as pd
from multiprocessing import Pool
# wrap your json importer in a function that can be mapped
def read_json(pos_json):
return json.load(pos_json)
def main():
# set up your pool
pool = Pool(processes=8) # or whatever your hardware can support
# get a list of file names
path_to_json = '/Users/Data/'
file_list = [pos_json for pos_json in os.listdir(path_to_json) if pos_json.endswith('.json')]
list = pool.map(read_json, file_list)
if __name__ == '__main__':
main()
#OzkanSener Thanks again for the reply. And for the tip. As you said, First I needed to identify my bottle neck. The bottleneck was in memory consumption. So, the method you suggested did not help so much. Instead I did the following:
with open('/Users/Data/netatmo_20171231_0000.json') as f:
d = json.load(f)
data1 = [flatten(i) for i in d]
with open('/Users/Data/netatmo_20171231_0000.json') as f:
d = json.load(f)
data2 = [flatten(i) for i in d]
with open('/Users/Data/netatmo_20171231_0010.json') as f:
d = json.load(f)
data3 = [flatten(i) for i in d]
And so on, reusing the d variable instead of creating new ones all the time.
At the end I can create only one big list:
from itertools import chain
data= list(chain(data1, data2, data3))
As mentioned in the title, i'm trying to make a simple py script that can be run from terminal to do the following:
Find all JSON files in current working directory and nested folders (this part works well)
Load said files
Recursively search them for a specific value or a substring
If the value is matching, replace it with a new established value by the user
Once finished, save all modified json files to a "converted" folder in the current directory.
That said, the issue is when i try the recursive search method posted below, since i'm pretty much new to python i would appreciate any help with this issue, what i suppose it is... either the json files i'm using or the search method i'm employing.
Simplifying the issue, the value i search for never matches with anything inside the object, be that a key or purely some string value. Tried multiple methods to perform a recursive search but can't get a match.
For example: taking in account the sample json, i want to replace the value "selectable_parts" or "static_parts" or even deeper in the structure "1h_mod310_door_00" but seems like my method of searching can't reach this value in "object[object][children][0][children][5][name]" (hope this helps).
Sample JSON: (https://drive.google.com/open?id=0B2-Bn2b0ujjVdW5YVGg3REg3OWs)
"""KEYWORD REPLACING MODULE."""
import os
import json
# functions
def get_files():
"""lists files"""
exclude = set(['.vscode', 'sample'])
json_files = []
for root, dirs, files in os.walk(os.getcwd(), topdown=True):
dirs[:] = [d for d in dirs if d not in exclude]
for name in files:
if name.endswith('.json'):
json_files.append(os.path.join(root, name))
return json_files
def load_files(json_files):
"""works files"""
for js_file in json_files:
with open(js_file) as json_file:
loaded_json = json.load(json_file)
replace_key_value(loaded_json, os.path.basename(js_file))
def write_file(data_file, new_file_name):
"""writes the file"""
if not os.path.exists('converted'):
os.makedirs('converted')
with open('converted/' + new_file_name, 'w') as json_file:
json.dump(data_file, json_file)
def replace_key_value(js_file, js_file_name):
"""replace and initiate save"""
recursive_replace(js_file, SKEY, '')
# write_file(js_file, js_file_name)
def recursive_replace(data, match, repl):
"""search for needed value and replace its value"""
for key, value in data.items():
if value == match:
print data[key]
print "AHHHHHHHH"
elif isinstance(value, dict):
recursive_replace(value, match, repl)
# main
print "\n" + '- on ' + os.getcwd()
NEW_DIR = raw_input('Work dir (leave empty if current): ')
if not NEW_DIR:
print NEW_DIR
NEW_DIR = os.getcwd()
else:
print NEW_DIR
os.chdir(NEW_DIR)
# get_files()
JS_FILES = get_files()
print '- files on ' + os.getcwd()
# print "\n".join(JS_FILES)
SKEY = raw_input('Value to search: ')
RKEY = raw_input('Replacement value: ')
load_files(JS_FILES)
The issue was the way i navigated the json obj because the method didn't considerate if it was a dict or a list (i believe...).
So to answer my own question here's the recursive search i'm using to check the values:
def get_recursively(search_dict, field):
"""
Takes a dict with nested lists and dicts,
and searches all dicts for a key of the field
provided.
"""
fields_found = []
for key, value in search_dict.iteritems():
if key == field:
print value
fields_found.append(value)
elif isinstance(value, dict):
results = get_recursively(value, field)
for result in results:
if SEARCH_KEY in result:
fields_found.append(result)
elif isinstance(value, list):
for item in value:
if isinstance(item, dict):
more_results = get_recursively(item, field)
for another_result in more_results:
if SEARCH_KEY in another_result:
fields_found.append(another_result)
return fields_found
# write_file(js_file, js_file_name)
Hope this helps someone.