Goal
I've got some complex json data with nested data in it which I am retrieving from an API I'm working with. In order to pull out the specific values I care about, I've created a function that will pull out all the values for a specific key that I can define. This is working well to retrieve the values in a list, however I am running into an issue where I need to return multiple values and associate them with one another so that I can get each result into a row in a csv file. Currently the code just returns separate arrays for each key. How would I go about associating them together? I've messed with the zip function in Python but can't seem to get it working properly. I sincerely appreciate any input you can give me.
Extract Function
def json_extract(obj, key):
"""Recursively fetch values from nested JSON."""
arr = []
def extract(obj, arr, key):
"""Recursively search for values of key in JSON tree."""
if isinstance(obj, dict):
for k, v in obj.items():
if isinstance(v, (dict, list)):
extract(v, arr, key)
elif k == key:
arr.append(v)
elif isinstance(obj, list):
for item in obj:
extract(item, arr, key)
return arr
values = extract(obj, arr, key)
return values
Main.py
res = requests.get(prod_url, headers=prod_headers, params=payload)
record_id = json_extract(res.json(), 'record_id')
status = json_extract(res.json(), 'status')
The solution was simple....just use the zip function ex: zip(record_id, status)
I had a syntax error that was preventing it from working before.
Related
From a Django app, I am able to consume data from a separate Restful API, but what about filtering? Below returns all books and its data. But what if I want to grab only books by an author, date, etc.? I want to pass an author's name parameter, e.g. .../authors-name or /?author=name and return only those in the json response. Is this possible?
views.py
def get_books(request):
response = requests.get('http://localhost:8090/book/list/').json()
return render(request, 'books.html', {'response':response})
So is there a way to filter like a model object?
I can think of three ways of doing this:
Python's filter could be used with a bit of additional code.
QueryableList, which is the closest to an ORM for lists I've seen.
query-filter, which takes a more functional approach.
1. Build-in filter function
You can write a function that returns functions that tell you whether a list element is a match and the pass the generated function into filter.
def filter_pred_factory(**kwargs):
def predicate(item):
for key, value in kwargs.items():
if key not in item or item[key] != value:
return False
return True
return predicate
def get_books(request):
books_data = requests.get('http://localhost:8090/book/list/').json()
pred = filter_pred_factory(**request.GET)
data_filter = filter(pred, books_data)
# data_filter is cast to a list as a precaution
# because it is a filter object,
# which can only be iterated through once before it's exhausted.
filtered_data = list(data_filter)
return render(request, 'books.html', {'books': filtered_data})
2. QueryableList
QueryableList would achieve the same as the above, with some extra features. As well as /books?isbn=1933988673, you could use queries like /books?longDescription__icontains=linux. You can find other functionality here
from QueryableList import QueryableListDicts
def get_books(request):
books_data = requests.get('http://localhost:8090/book/list/').json()
queryable_books = QueryableListDicts(books_data)
filtered_data = queryable_books.filter(**request.GET)
return render(request, 'books.html', {'books':filtered_data})
3. query-filter
query-filter has similar features but doesn't copy the object-orient approach of an ORM.
from query_filter import q_filter, q_items
def get_books(request):
books_data = requests.get('http://localhost:8090/book/list/').json()
data_filter = q_filter(books_data, q_items(**request.GET))
# filtered_data is cast to a list as a precaution
# because q_filter returns a filter object,
# which can only be iterated through once before it's exhausted.
filtered_data = list(data_filter)
return render(request, 'books.html', {'books': filtered_data})
It's worth mentioning that I wrote query-filter.
I have a function that looks through a log file. It matches a regular expression in the log file to indicate a new log entry. Once it's done this it then grabs all the information after this point before the next regular expression (which would indicate a new log entry).
For each log entry, some relevant information is placed into a dictionary (error number, error message, etc)
At the end of my createGenerator function I yield mydict because I don't want to store every log entry and then pass it to my second function generatorCheck().
What I want generatorCheck to do is check key, value pairs that have been passed from the createGenerator function. I then want to put all the matching key value pairs into a table. I'm not sure how to do this though as I haven't worked a lot with yield or generators.
def createGenerator():
mydict = {
'key1': 'value2',
'key2': 'value3'
...
...
}
yield mydict
def generatorCheck():
dict2 = {}
createGenerator()
for i in createGenerator():
if 'key1' or 'key2' in createGenerator():
# store key, value pair in dict2
generatorCheck()
I have data loaded from JSON and am trying to extract arbitrary nested values using a list as input, where the list corresponds to the names of successive children. I want a function get_value(data,lookup) that returns the value from data by treating each entry in lookup as a nested child.
In the example below, when lookup=['alldata','TimeSeries','rates'], the return value should be [1.3241,1.3233].
json_data = {'alldata':{'name':'CAD/USD','TimeSeries':{'dates':['2018-01-01','2018-01-02'],'rates':[1.3241,1.3233]}}}
def get_value(data,lookup):
res = data
for item in lookup:
res = res[item]
return res
lookup = ['alldata','TimeSeries','rates']
get_value(json_data,lookup)
My example works, but there are two problems:
It's inefficient - In my for loop, I copy the whole TimeSeries object to res, only to then replace it with the rates list. As #Andrej Kesely explained, res is a reference at each iteration, so data isn't being copied.
It's not concise - I was hoping to be able to find a concise (eg one or two line) way of extracting the data using something like list comprehension syntax
If you want one-liner and you are using Python 3.8, you can use assignment expression ("walrus operator"):
json_data = {'alldata':{'name':'CAD/USD','TimeSeries':{'dates':['2018-01-01','2018-01-02'],'rates':[1.3241,1.3233]}}}
def get_value(data,lookup):
return [data:=data[item] for item in lookup][-1]
lookup = ['alldata','TimeSeries','rates']
print( get_value(json_data,lookup) )
Prints:
[1.3241, 1.3233]
I don't think you can do it without a loop, but you could use a reducer here to increase readability.
functools.reduce(dict.get, lookup, json_data)
This is an extension to In Python, how to concisely get nested values in json data?
I have data loaded from JSON and am trying to replace arbitrary nested values using a list as input, where the list corresponds to the names of successive children. I want a function replace_value(data,lookup,value) that replaces the value in the data by treating each entry in lookup as a nested child.
Here is the structure of what I'm trying to do:
json_data = {'alldata':{'name':'CAD/USD','TimeSeries':{'dates':['2018-01-01','2018-01-02'],'rates':[1.3241,1.3233]}}}
def replace_value(data,lookup,value):
DEFINITION
lookup = ['alldata','TimeSeries','rates']
replace_value(json_data,lookup,[2,3])
# The following should return [2,3]
print(json_data['alldata']['TimeSeries']['rates'])
I was able to make a start with get_value(), but am stumped about how to do replacement. I'm not fixed to this code structure, but want to be able to programatically replace a value in the data given the list of successive children and the value to replace.
Note: it is possible that lookup can be of length 1
Follow the lookups until we're second from the end, then assign the value to the last lookup in the current object
def get_value(data,lookup): # Or whatever definition you like
res = data
for item in lookup:
res = res[item]
return res
def replace_value(data, lookup, value):
obj = get_value(data, lookup[:-1])
obj[lookup[-1]] = value
json_data = {'alldata':{'name':'CAD/USD','TimeSeries':{'dates':['2018-01-01','2018-01-02'],'rates':[1.3241,1.3233]}}}
lookup = ['alldata','TimeSeries','rates']
replace_value(json_data,lookup,[2,3])
print(json_data['alldata']['TimeSeries']['rates']) # [2, 3]
If you're worried about the list copy lookup[:-1], you can replace it with an iterator slice:
from itertools import islice
def replace_value(data, lookup, value):
it = iter(lookup)
slice = islice(it, len(lookup)-1)
obj = get_value(data, slice)
final = next(it)
obj[final] = value
You can obtain the parent to the final sub-dict first, so that you can reference it to alter the value of that sub-dict under the final key:
def replace_value(data, lookup, replacement):
*parents, key = lookup
for parent in parents:
data = data[parent]
data[key] = replacement
so that:
json_data = {'alldata':{'name':'CAD/USD','TimeSeries':{'dates':['2018-01-01','2018-01-02'],'rates':[1.3241,1.3233]}}}
lookup = ['alldata','TimeSeries','rates']
replace_value(json_data,lookup,[2,3])
print(json_data['alldata']['TimeSeries']['rates'])
outputs:
[2, 3]
Once you have get_value
get_value(json_data, lookup[:-1])[lookup[-1]] = value
We know both of this works for sorted():
sorted(['second', 'first', 'third'])
sorted([('first','second'), ('second', 'first'), ('first', 'third')])
By sorting the second one, the tuples are compared lexicographically; the first items are compared; if they are the same then the second items are compared, and so on.
But how to apply a key function on all the individual strings (or anything else there) for sorted which works for both containers and works recursively in the second case? Let's say func converts 'first' to 3, 'second' to 1 and 'third' to 2. I want this result:
['second', 'third', 'first']
[('second', 'first'), ('first','second'), ('first', 'third')]
I made this function to use as key but I dont like typechecking in it since it applies func only on strings which is not a general solution:
def recursively_apply_func_on_strings(target, func,
fargs=(), fkwargs={}):
if isinstance(target, str):
return func(target, *fargs, **fkwargs)
result, f = [], recursively_apply_func_on_strings
for elem in target:
result.append(f(elem, func, fargs, fkwargs))
return tuple(result)
sorted(sequence, key=lambda x: recursively_apply_string_func(x, func))
Is there a cleaner way to do this?
Well, despite my comment saying otherwise, I think there are a few possible ways to improve things.
One idea is to make your function a key-function factory. This way you won't need a lambda to apply it with extra arguments in your sorted call.
Another idea is to apply func to all non-iterable values (plus strings), using the abstract Iterable type from the collections module to test against.
Here's some code:
from collections import Iterable
def recursive_key(func, fargs=(), fkwargs={}):
def key_func(target):
if isinstance(target, str) or not isinstance(target, Iterable):
return func(target, *fargs, **fkwargs)
return tuple(key_func(item) for item in target)
return key_func
You'd call it like this (sorting by hexidecimal integer value, rather than string value):
sorted([('a', 'F'), ('A', 'd')], key=recursive_key(int, (16,)))
Note that we're calling recursive_key and it's return value (a.k.a. key_func) is what is being passed as the key parameter to sorted.