Parse nested JSON in Flask - json

I have a REST API endpoint in which I need to parse incoming nested JSON of the format:
site: {
id: '37251',
site_name: 'TestSite',
address: {
'address': '1234 Blaisdell Ave',
'city': 'Minneapolis',
'state': 'MN',
'zip': '55456',
'neighborhood': 'Kingfield',
'county': 'Hennepin',
},
geolocation: {
latitude : '41.6544',
longitude : '73.3322',
accuracy: '45'
}
}
into the following SQLAlchemy classes:
Site:
class Site(db.Model):
__tablename__ = 'site'
id = Column(Integer, primary_key=True, autoincrement=True)
site_name = Column(String(80))# does site have a formal name
address_id = Column(Integer, ForeignKey('address.id'))
address = relationship("Address", backref=backref("site", uselist=False))
geoposition_id = Column(Integer, ForeignKey('geoposition.id'))
geoposition = relationship("Geoposition", backref=backref("site", uselist=False))
evaluations = relationship("Evaluation", backref="site")
site_maintainers = relationship("SiteMaintainer", backref="site")
Address (a Site has one Address):
class Address(db.Model):
__tablename__ = 'address'
id = Column(Integer, primary_key=True, autoincrement=True)
address = Column(String(80))
city = Column(String(80))
state = Column(String(2))
zip = Column(String(5))
neighborhood = Column(String(80))
county = Column(String(80))
and Geoposition (a Site has one Geoposition):
class Geoposition(db.Model):
__tablename__ = 'geoposition'
id = Column(Integer, primary_key=True, autoincrement=True)
site_id = Column(Integer)
latitude = Column(Float(20))
longitude = Column(Float(20))
accuracy = Column(Float(20))
timestamp = Column(DateTime)
Getting the SQLAlchemey data into JSON is easy, but I need to parse the JSON from my request so that I can append/update data that is sent via POST to the RESTful API. I know how to handle non-nested JSON, but I will be the first to admit that I am clueless in dealing with the nested JSON for records that belong to multiple tables in a relational structure.
I've tried searching high and low for this without any luck. Closest I could find is here "Nested validation with the flask-restful RequestParser", but this is not clicking for what I need to do based on my nested structure.

Cool! Looks like Flask handles this through the request handler:
With this JSON:
site: {
"id": "37251",
"site_name": "TestSite",
"address": {
"address": "1234 Blaisdell Ave",
"city": "Minneapolis",
"state": "MN",
"zip": "55456",
"neighborhood": "Kingfield",
"county": "Hennepin"
},
geolocation: {
latitude : "41.6544",
longitude : "73.3322",
accuracy: "45"
}
}
Sent to this endpoint:
#app.route('/api/resource', methods=['GET','POST','OPTIONS'])
#cross_origin() # allow all origins all methods
#auth.login_required
def get_resource():
# Get the parsed contents of the form data
json = request.json
print(json)
# Render template
return jsonify(json)
I get the following object:
{u'site': {u'geolocation': {u'latitude': u'41.6544', u'longitude': u'73.3322', u'accuracy': u'45'}, u'site_name': u'TestSite', u'id': u'37251', u'address': {u'city': u'Minneapolis', u'neighborhood': u'Kingfield', u'zip': u'55456', u'county': u'Hennepin', u'state': u'MN', u'address': u'1234 Blaisdell Ave'}}}
UPDATE:
Am able to access all my dictionary items just fine using this code for a test in my endpoint:
# print entire object
print json['site']
# define dictionary item for entire object
site = json['site']
print site["site_name"]
# print address object
print site['address']
# define address dictionary object
address = json['site']['address']
print address["address"]
# define geolocation dictionary object
geolocation = json['site']['geolocation']
print geolocation["accuracy"]
In retrospect, this seems rather trivial now. I hope this helps someone in the future.

Do you have access and can edit the JSON?
A few edits would help make it a valid JSON:
Use double quotes for keys and values
Open and close JSON with { and }
Delete the trailing , in the country line
If you can do so, you JSON will look like this:
{
"site": {
"id": "37251",
"site_name": "TestSite",
"address": {
"address": "1234BlaisdellAve",
"city": "Minneapolis",
"state": "MN",
"zip": "55456",
"neighborhood": "Kingfield",
"county": "Hennepin"
},
"geolocation": {
"latitude": "41.6544",
"longitude": "73.3322",
"accuracy": "45"
}
}
}
Sorting that out, use Python's json to deal with it:
import json
file_handler = open('test.json', 'r')
parsed_data = json.loads(file_handler.read())
print parsed_data
The output is a diciotnary that you can easily iterate with to validate your data:
{u'site': {u'geolocation': {u'latitude': u'41.6544', u'longitude': u'73.3322', u'accuracy': u'45'}, u'site_name': u'TestSite', u'id': u'37251', u'address': {u'city': u'Minneapolis', u'neighborhood': u'Kingfield', u'zip': u'55456', u'county': u'Hennepin', u'state': u'MN', u'address': u'1234BlaisdellAve'}}}
But if you can't edit your JSON to make its syntax better, json.loads would not parse it…

Related

Pulling specific Parent/Child JSON data with Python

I'm having a difficult time figuring out how to pull specific information from a json file.
So far I have this:
# Import json library
import json
# Open json database file
with open('jsondatabase.json', 'r') as f:
data = json.load(f)
# assign variables from json data and convert to usable information
identifier = data['ID']
identifier = str(identifier)
name = data['name']
name = str(name)
# Collect data from user to compare with data in json file
print("Please enter your numerical identifier and name: ")
user_id = input("Numerical identifier: ")
user_name = input("Name: ")
if user_id == identifier and user_name == name:
print("Your inputs matched. Congrats.")
else:
print("Your inputs did not match our data. Please try again.")
And that works great for a simple JSON file like this:
{
"ID": "123",
"name": "Bobby"
}
But ideally I need to create a more complex JSON file and can't find deeper information on how to pull specific information from something like this:
{
"Parent": [
{
"Parent_1": [
{
"Name": "Bobby",
"ID": "123"
}
],
"Parent_2": [
{
"Name": "Linda",
"ID": "321"
}
]
}
]
}
Here is an example that you might be able to pick apart.
You could either:
Make a custom de-jsonify object_hook as shown below and do something with it. There is a good tutorial here.
Just gobble up the whole dictionary that you get without a custom de-jsonify and drill down into it and make a list or set of the results. (not shown)
Example:
import json
from collections import namedtuple
data = '''
{
"Parents":
[
{
"Name": "Bobby",
"ID": "123"
},
{
"Name": "Linda",
"ID": "321"
}
]
}
'''
Parent = namedtuple('Parent', ['name', 'id'])
def dejsonify(json_str: dict):
if json_str.get("Name"):
parent = Parent(json_str.get('Name'), int(json_str.get('ID')))
return parent
return json_str
res = json.loads(data, object_hook=dejsonify)
print(res)
# then we can do whatever... if you need lookups by name/id,
# we could put the result into a dictionary
all_parents = {(p.name, p.id) : p for p in res['Parents']}
lookup_from_input = ('Bobby', 123)
print(f'found match: {all_parents.get(lookup_from_input)}')
Result:
{'Parents': [Parent(name='Bobby', id=123), Parent(name='Linda', id=321)]}
found match: Parent(name='Bobby', id=123)

Emit Python embedded object as native JSON in YAML document

I'm importing webservice tests from Excel and serialising them as YAML.
But taking advantage of YAML being a superset of JSON I'd like the request part of the test to be valid JSON, i.e. to have delimeters, quotes and commas.
This will allow us to cut and paste requests between the automated test suite and manual test tools (e.g. Postman.)
So here's how I'd like a test to look (simplified):
- properties:
METHOD: GET
TYPE: ADDRESS
Request URL: /addresses
testCaseId: TC2
request:
{
"unitTypeCode": "",
"unitNumber": "15",
"levelTypeCode": "L",
"roadNumber1": "810",
"roadName": "HAY",
"roadTypeCode": "ST",
"localityName": "PERTH",
"postcode": "6000",
"stateTerritoryCode": "WA"
}
In Python, my request object has a dict attribute called fields which is the part of the object to be serialised as JSON. This is what I tried:
import yaml
def request_presenter(dumper, request):
json_string = json.dumps(request.fields, indent=8)
return dumper.represent_str(json_string)
yaml.add_representer(Request, request_presenter)
test = Test(...including embedded request object)
serialised_test = yaml.dump(test)
I'm getting:
- properties:
METHOD: GET
TYPE: ADDRESS
Request URL: /addresses
testCaseId: TC2
request: "{
\"unitTypeCode\": \"\",\n
\"unitNumber\": \"15\",\n
\"levelTypeCode": \"L\",\n
\"roadNumber1\": \"810\",\n
\"roadName\": \"HAY\",\n
\"roadTypeCode\": \"ST\",\n
\"localityName\": \"PERTH\",\n
\"postcode\": \"6000\",\n
\"stateTerritoryCode\": \"WA\"\n
}"
...only worse because it's all on one line and has white space all over the place.
I tried using the | style for literal multi-line strings which helps with the line breaks and escaped quotes (it's more involved but this answer was helpful.) However, escaped or multiline, the result is still a string that will need to be parsed separately.
How can I stop PyYaml analysing the JSON block as a string and make it just accept a block of text as part of the emitted YAML? I'm guessing it's something to do with overriding the emitter but I could use some help. If possible I'd like to avoid post-processing the serialised test to achieve this.
Ok, so this was the solution I came up with. Generate the YAML with a placemarker ahead of time. The placemarker marks the place where the JSON should be inserted, and also defines the root-level indentation of the JSON block.
import os
import itertools
import json
def insert_json_in_yaml(pre_insert_yaml, key, obj_to_serialise):
marker = '%s: null' % key
marker_line = line_of_first_occurrence(pre_insert_yaml, marker)
marker_indent = string_indent(marker_line)
serialised = json.dumps(obj_to_serialise, indent=marker_indent + 4)
key_with_json = '%s: %s' % (key, serialised)
serialised_with_json = pre_insert_yaml.replace(marker, key_with_json)
return serialised_with_json
def line_of_first_occurrence(basestring, substring):
"""
return line number of first occurrence of substring
"""
lineno = lineno_of_first_occurrence(basestring, substring)
return basestring.split(os.linesep)[lineno]
def string_indent(s):
"""
return indentation of a string (no of spaces before a nonspace)
"""
spaces = ''.join(itertools.takewhile(lambda c: c == ' ', s))
return len(spaces)
def lineno_of_first_occurrence(basestring, substring):
"""
return line number of first occurrence of substring
"""
return basestring[:basestring.index(substring)].count(os.linesep)
embedded_object = {
"unitTypeCode": "",
"unitNumber": "15",
"levelTypeCode": "L",
"roadNumber1": "810",
"roadName": "HAY",
"roadTypeCode": "ST",
"localityName": "PERTH",
"postcode": "6000",
"stateTerritoryCode": "WA"
}
yaml_string = """
---
- properties:
METHOD: GET
TYPE: ADDRESS
Request URL: /addresses
testCaseId: TC2
request: null
after_request: another value
"""
>>> print(insert_json_in_yaml(yaml_string, 'request', embedded_object))
- properties:
METHOD: GET
TYPE: ADDRESS
Request URL: /addresses
testCaseId: TC2
request: {
"unitTypeCode": "",
"unitNumber": "15",
"levelTypeCode": "L",
"roadNumber1": "810",
"roadName": "HAY",
"roadTypeCode": "ST",
"localityName": "PERTH",
"postcode": "6000",
"stateTerritoryCode": "WA"
}
after_request: another value

Python: JSON to Dictionary

Two examples for a JSON request. Both examples should have the correct JSON syntax, yet only the second version seems to be translatable to a dictionary.
#doesn't work
string_js3 = """{"employees": [
{
"FNAME":"FTestA",
"LNAME":"LTestA",
"SSN":6668844441
},
{
"FNAME":"FTestB",
"LNAME":"LTestB",
"SSN":6668844442
}
]}
"""
#works
string_js4 = """[
{
"FNAME":"FTestA",
"LNAME":"LTestA",
"SSN":6668844441
},
{
"FNAME":"FTestB",
"LNAME":"LTestB",
"SSN":6668844442
}]
"""
This gives an error, while the same with string_js4 works
L1 = json.loads(string_js3)
print(L1[0]['FNAME'])
So I have 2 questions:
1) Why doesn't the first version work
2) Is there a simple way to make the first version also work?
Both of these strings are valid JSON. Where you are getting stuck is in how you are accessing the resulting data structures.
L1 (from string_js3) is a (nested) dict;
L2 (from string_js4) is a list of dicts.
Walkthrough:
import json
string_js3 = """{
"employees": [{
"FNAME": "FTestA",
"LNAME": "LTestA",
"SSN": 6668844441
},
{
"FNAME": "FTestB",
"LNAME": "LTestB",
"SSN": 6668844442
}
]
}"""
string_js4 = """[{
"FNAME": "FTestA",
"LNAME": "LTestA",
"SSN": 6668844441
},
{
"FNAME": "FTestB",
"LNAME": "LTestB",
"SSN": 6668844442
}
]"""
L1 = json.loads(string_js3)
L2 = json.loads(string_js4)
The resulting objects:
L1
{'employees': [{'FNAME': 'FTestA', 'LNAME': 'LTestA', 'SSN': 6668844441},
{'FNAME': 'FTestB', 'LNAME': 'LTestB', 'SSN': 6668844442}]}
L2
[{'FNAME': 'FTestA', 'LNAME': 'LTestA', 'SSN': 6668844441},
{'FNAME': 'FTestB', 'LNAME': 'LTestB', 'SSN': 6668844442}]
type(L1), type(L2)
(dict, list)
1) Why doesn't the first version work?
Because calling L1[0] is trying to return the value from the key 0, and that key doesn't exist. From the docs, "It is an error to extract a value using a non-existent key." L1 is a dictionary with just one key:
L1.keys()
dict_keys(['employees'])
2) Is there a simple way to make the first version also work?
There are several ways, but it ultimately depends on what your larger problem looks like. I'm going to assume you want to modify the Python code rather than the JSON files/strings themselves. You could do:
L3 = L1['employees'].copy()
You now have a list of dictionaries that resembles L2:
L3
[{'FNAME': 'FTestA', 'LNAME': 'LTestA', 'SSN': 6668844441},
{'FNAME': 'FTestB', 'LNAME': 'LTestB', 'SSN': 6668844442}]

GAE python27 return nested json

This seems such a simple task, yet it eludes me...
class ViewAllDogs(webapp2.RequestHandler):
""" Returns an array of json objects representing all dogs. """
def get(self):
query = Dog.query()
results = query.fetch(limit = MAX_DOGS) # 100
aList = []
for match in results:
aList.append({'id': match.id, 'name': match.name,
'owner': match.owner, arrival_date':match.arrival_date})
aList.append({'departure_history':{'departure_date': match.departure_date,
'departed_dog': match.departed_dog}})
self.response.headers['Content-Type'] = 'application/json'
self.response.write(json.dumps(aList))
The above, my best attempt to date, gets me:
[
{
"arrival_date": null,
"id": "a link to self",
"owner": 354773,
"name": "Rover"
},
{
"departure_history": {
"departed_dog": "Jake",
"departure_date": 04/24/2017
}
},
# json array of objects continues...
]
What I'm trying to get is the departure_history nested:
[
{
"id": "a link to self...",
"owner": 354773,
"name": "Rover",
"departure_history": {
"departed_dog": "Jake",
"departure_date": 04/24/2017
},
"arrival_date": 04/25/2017,
},
# json array of objects continues...
]
I've tried a bunch of different combinations, looked at json docs, python27 docs, no joy, and burned about way too many hours with this. I got this far with the many related SO posts on this topic. Thanks in advance.
You can simplify a little:
aList = []
for match in results:
aDog = {'id': match.id,
'name': match.name,
'owner': match.owner,
'arrival_date':match.arrival_date,
'departure_history': {
'departure_date': match.departure_date,
'departed_dog': match.departed_dog}
}
aList.append(aDog)
This seems a bit hackish, but it works. If you know a better way, by all means, let me know. Thanks.
class ViewAllDogs(webapp2.RequestHandler):
""" Returns an array of json objects representing all dogs. """
def get(self):
query = Dog.query()
results = query.fetch(limit = MAX_DOGS) # 100
aList = []
i = 0
for match in results:
aList.append({'id': match.id, 'name': match.name,
'owner': match.owner, arrival_date':match.arrival_date})
aList[i]['departure_history'] = ({'departure_history':{'departure_date': match.departure_date,
'departed_dog': match.departed_dog}})
i += 1
self.response.headers['Content-Type'] = 'application/json'
self.response.write(json.dumps(aList))

compare input to fields in a json file in ruby

I am trying to create a function that takes an input. Which in this case is a tracking code. Look that tracking code up in a JSON file then return the tracking code as output. The json file is as follows:
[
{
"tracking_number": "IN175417577",
"status": "IN_TRANSIT",
"address": "237 Pentonville Road, N1 9NG"
},
{
"tracking_number": "IN175417578",
"status": "NOT_DISPATCHED",
"address": "Holly House, Dale Road, Coalbrookdale, TF8 7DT"
},
{
"tracking_number": "IN175417579",
"status": "DELIVERED",
"address": "Number 10 Downing Street, London, SW1A 2AA"
}
]
I have started using this function:
def compare_content(tracking_number)
File.open("pages/tracking_number.json", "r") do |file|
file.print()
end
Not sure how I would compare the input to the json file. Any help would be much appreciated.
You can use the built-in JSON module.
require 'json'
def compare_content(tracking_number)
# Loads ENTIRE file into string. Will not be effective on very large files
json_string = File.read("pages/tracking_number.json")
# Uses the JSON module to create an array from the JSON string
array_from_json = JSON.parse(json_string)
# Iterates through the array of hashes
array_from_json.each do |tracking_hash|
if tracking_number == tracking_hash["tracking_number"]
# If this code runs, tracking_hash has the data for the number you are looking up
end
end
end
This will parse the JSON supplied into an array of hashes which you can then compare to the number you are looking up.
If you are the one generating the JSON file and this method will be called a lot, consider mapping the tracking numbers directly to their data for this method to potentially run much faster. For example,
{
"IN175417577": {
"status": "IN_TRANSIT",
"address": "237 Pentonville Road, N1 9NG"
},
"IN175417578": {
"status": "NOT_DISPATCHED",
"address": "Holly House, Dale Road, Coalbrookdale, TF8 7DT"
},
"IN175417579": {
"status": "DELIVERED",
"address": "Number 10 Downing Street, London, SW1A 2AA"
}
}
That would parse into a hash, where you could much more easily grab the data:
require 'json'
def compare_content(tracking_number)
json_string = File.read("pages/tracking_number.json")
hash_from_json = JSON.parse(json_string)
if hash_from_json.key?(tracking_number)
tracking_hash = hash_from_json[tracking_number]
else
# Tracking number does not exist
end
end