How to read dynamic/changing keys from JSON in Python - json

I am using the UK Bus API to collect bus arrival times etc.
In Python 3 I have been using....
try:
connection = http.client.HTTPSConnection("transportapi.com")
connection.request("GET", "/v3/uk/bus/stop/xxxxxxxxx/live.json?app_id=xxxxxxxxxxxxxxx&app_key=xxxxxxxxxxxxxxxxxxxxxxxxxxx&group=route&nextbuses=yes")
res = connection.getresponse()
data = res.read()
connection.close()
from types import SimpleNamespace as Namespace
x = json.loads(data, object_hook=lambda d: Namespace(**d))
print("Stop Name : " + x.stop_name)
Which is all reasonably simple, however the JSON data returned looks like this...
{
"atcocode":"xxxxxxxx",
"smscode":"xxxxxxxx",
"request_time":"2020-03-10T15:42:22+00:00",
"name":"Hospital",
"stop_name":"Hospital",
"bearing":"SE",
"indicator":"adj",
"locality":"Here",
"location":{
"type":"Point",
"coordinates":[
-1.xxxxx,
50.xxxxx
]
},
"departures":{
"8":[
{
"mode":"bus",
"line":"8",
"line_name":"8",
"direction":"North",
"operator":"CBLE",
"date":"2020-03-10",
Under "departures" the key name changes due to the bus number / code.
Using Python 3 how do I extract the key name and all subsequent values below/within it?
Many thanks for any help!

You can do this:
for k,v in x["departures"].items():
print(k,v) #or whatever else you wanted to do.
Which returns:
8 [{'mode': 'bus', 'line': '8', 'line_name': '8', 'direction': 'North', 'operator': 'CBLE', 'date': '2020-03-10'}]
So k is equal to 8 and v is the value.

Related

FastAPI not returning list of SQLAlchemy rows properly

I am trying to return a list of SQLAlchemy rows and output it from a FastAPI endpoint. Each row in the list consists of the actor's name and the actor's total number of lines from a show. The query itself I believe is correct, but when viewing the output from the endpoint, one of the columns is missing for some reason.
The query function:
def get_actors(db: Session, detailed: bool = False) -> list[str]:
"""Return a list of actors and their total lines from the show"""
if not detailed:
query = (
db.query(models.Script.actor, func.count(models.Script.detail)) # (<actor name>, <total lines>)
.filter(models.Script.actor.isnot(None)) # Skip null actors.
.group_by(models.Script.actor) # Group unique actors.
.order_by(func.count(models.Script.detail).desc()) # Sort by most to least lines.
)
actors_list = [actor for actor in query]
print("ACTORS LIST:", actors_list) # Debug print, looks fine.
return actors_list
...
FastAPI endpoint
#holy_api.get("/actors", response_class=PrettyJSONResponse)
def get_actors(detailed: bool = False, db: Session = Depends(get_db)):
"""Get a list of all the actors from the show with their total lines, optionally detailed view"""
return crud.get_actors(db, detailed=detailed)
Now, when I open up /actors, the terminal prints out the debug looking like:
ACTORS LIST: [('Michael Palin', 2454), ('Eric Idle', 2107), ('John Cleese', 2044), ('Graham Chapman', 1848), ('Terry Jones', 1801), ('Carol Cleveland', 277), ('Terry Gilliam', 85), ('Terry\nJones', 35), ('Neil Innes', 12), ('Ian Davidson', 8), ('Connie Booth', 5), ('Katya Wyeth', 4), ('Rita Davies', 3), ('Marjorie Wilde', 3), ('Donna Reading', 2), ('Nicki Howorth', 1), ('Julia Breck', 1), ('Caron Gardener', 1)]
INFO: 127.0.0.1:54057 - "GET /actors HTTP/1.1" 200 OK
This is exactly what I need. But the actual JSON response looks like:
[
{
"actor": "Michael Palin"
},
{
"actor": "Eric Idle"
},
{
"actor": "John Cleese"
},
{
"actor": "Graham Chapman"
},
{
"actor": "Terry Jones"
},
{
"actor": "Carol Cleveland"
},
...
]
The total lines aren't shown next to the actor. Why? I am not sure if this is a problem with FastAPI not parsing the JSON properly or if this is a SQLAlchemy thing or I'm doing something plainly wrong.
Wow! Shortly after posting this question, I fixed it. All I had to do was attach a label to the aggregate count function: func.count(models.Script.detail).label("total_lines") I've been on this problem for hours and I feel really, really dumb right now.

How to efficiently parse JSON data with multiple keys in Python 2.7?

I'm writing a script that will check the CVS COVID vaccine availability for cities in my state of VA. I have been successful getting the data I'm looking for, but my code is hard coded in some areas. I'm specifically asking for help improving my code in the areas number 1 & 2 below:
The JSON file can be found here:
https://www.cvs.com//immunizations/covid-19-vaccine.vaccine-status.VA.json?vaccineinfo
I'm trying to access the data in the responsePayloadData key. The only way I could figure out how to do this is to make it the only key. For that reason, I deleted the other key responseMetaData:
#remove the key that we don't need
del obj['responseMetaData']
I'm also not sure how to dynamically loop through the VA items without hard coding the number of cities I know are there in the data:
for x, y in obj.items():
for a in range(34):
Here's the full code:
import requests
import json
import time
from datetime import datetime
import urllib2
try:
import indigo
except:
pass
strAvail = "False"
strAvailCity = "None"
try:
# download raw json object from CVS Virginia Website
url = "https://www.cvs.com//immunizations/covid-19-vaccine.vaccine-status.VA.json?vaccineinfo"
data = urllib2.urlopen(url).read().decode()
except urllib2.HTTPError, err:
return {"error": err.reason, "error_code": err.code}
# parse json object
obj = json.loads(data)
# remove the key that we don't need
del obj['responseMetaData']
# loop through the JSON dictionary and check availability
# status options: {"Fully Booked", "Available"}
for x, y in obj.items():
for a in range(34):
# print('City: ' + y['data']['VA'][a]['city'])
# print('Total Available: ' + y['data']['VA'][a]['totalAvailable'])
# print('Percent Available: ' + y['data']['VA'][a]['pctAvailable'])
# print('Status: ' + y['data']['VA'][a]['status'])
# print("------------------------------")
# If there is availability anywhere in the state, take some action.
if y['data']['VA'][a]['status'] == "Available":
strAvail = True
strAvailCity = y['data']['VA'][a]['city']
# Log timestamp for this check to the JSON
now = datetime.now()
strDateTime = now.strftime("%m/%d/%Y %I:%M %p")
EDIT: Since the JSON is not available outside the US. I've pasted it below:
{"responsePayloadData":{"currentTime":"2021-02-11T14:55:00.470","data":{"VA":[{"totalAvailable":"1","city":"ABINGDON","state":"VA","pctAvailable":"0.19%","status":"Fully Booked"},{"totalAvailable":"0","city":"ALEXANDRIA","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"ARLINGTON","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"BEDFORD","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"BLACKSBURG","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"CHARLOTTESVILLE","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"CHATHAM","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"CHESAPEAKE","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"1","city":"DANVILLE","state":"VA","pctAvailable":"0.19%","status":"Fully Booked"},{"totalAvailable":"2","city":"DUBLIN","state":"VA","pctAvailable":"0.39%","status":"Fully Booked"},{"totalAvailable":"0","city":"FAIRFAX","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"FREDERICKSBURG","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"GAINESVILLE","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"HAMPTON","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"HARRISONBURG","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"LEESBURG","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"LYNCHBURG","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"MARTINSVILLE","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"MECHANICSVILLE","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"MIDLOTHIAN","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},
{"totalAvailable":"0","city":"NEWPORT NEWS","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"NORFOLK","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"PETERSBURG","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"PORTSMOUTH","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"RICHMOND","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"ROANOKE","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},
{"totalAvailable":"0","city":"ROCKY MOUNT","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"STAFFORD","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"SUFFOLK","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},
{"totalAvailable":"0","city":"VIRGINIA BEACH","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"WARRENTON","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"WILLIAMSBURG","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"WINCHESTER","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"WOODSTOCK","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"}]}},"responseMetaData":{"statusDesc":"Success","conversationId":"Id-beb5f68730b34e6aa3bbc1fd927ea12b","refId":"Id-b4a7256078789eb59b8912b4","operation":"getInventorybyCity","statusCode":"0000"}}
Regarding problem 1, you can just access the data by key. You don't need to delete the other key:
payload = obj['responsePayloadData']
For the second problem, you can just iterate over the items in the list associated with obj['data']['VA']:
for city in payload['data']['VA']:
print(city)
{'city': 'ABINGDON',
'pctAvailable': '0.19%',
'state': 'VA',
'status': 'Fully Booked',
'totalAvailable': '1'}
{'city': 'ALEXANDRIA',
'pctAvailable': '0.00%',
'state': 'VA',
'status': 'Fully Booked',
'totalAvailable': '0'}
...

Aggregate JSON files in Spark RDD

I have a series of files that look similar to this:
[
{
'id':1,
'transactions': [
{
'date': '2019-01-01',
'amount': 50.50
},
{
'date': '2019-01-02',
'amount': 10.20
},
]
},
{
'id':2,
'transactions': [
{
'date': '2019-01-01',
'amount': 10.20
},
{
'date': '2019-01-02',
'amount': 0.50
},
]
}
]
I load these files to Spark using the following code
users= spark.read.option("multiline", "true").json(file_location)
The result is a SparkData Frame with two columns id and transactions where transactions is a StructType.
I want to be able to "map" the transactions per user to aggregate them.
Currently I am using rdd and a function that looks like this:
users.rdd.map(lambda a: summarize_transactions(a.transactions))
The summarize function can be of two types:
a) Turn the list of objects into a Pandas Dataframe to summarize it.
b) Iterate over the list of objects to summarize it.
However I find out that a.transactions is a list of pyspark.sql.types.Row. Instead of actual dictionaries.
1) Is this the best way to accomplish my goal?
2) How can I turn the list of Spark Rows into the original list of Dictionaries?
I found a way to solve my own problem:
STEP 1: LOAD DATA AS TEXTFILE:
step1= sc.textFile(file_location)
STEP 2: READ AS JSON AND FLATMAP
import json
step2 = step1.map(lambda a: json.loads(a)).flatMap(lambda a: a)
STEP 3: KEY MAP REDUCE
setp3 = (
step2
.map(lambda line: [line['id'], line['transactions']])
.reduceByKey(lambda a, b: a + b)
.mapValues(lambda a: summarize_transactions(a))
)

Getting values from Json data in Python

I have a json file that I am trying to pull specific attribute data from. The Json data is essentially a dictionary. Before the data is turned into a file, it is first held in a variable like this:
params = {'f': 'json', 'where': '1=1', 'geometryType': 'esriGeometryPolygon', 'spatialRel': 'esriSpatialRelIntersects','outFields': '*', 'returnGeometry': 'true'}
r = requests.get('https://hazards.fema.gov/gis/nfhl/rest/services/CSLF/Prelim_CSLF/MapServer/3/query', params)
cslfJson = r.json()
and then written into a file like this:
path = r"C:/Workspace/Sandbox/ScratchTests/cslf.json"
with open(path, 'w') as f:
json.dump(cslfJson, f, indent=2)
within this json data is an attribute called DFIRM_ID. I want to create an empty list called dfirm_id = [], get all of the values for DFIRM_ID and for that value, append it to the list like this dfirm_id.append(value). I am thinking I need to somehow read through the json variable data or the actual file, but I am not sure how to do it. Any suggestions on an easy method to accomplish this?
dfirm_id = []
for k, v in cslf:
if cslf[k] == 'DFIRM_ID':
dfirm.append(cslf[v])
As requested, here is what print(cslfJson) looks like:
It actually prints a huge dictionary that looks like this:
{'displayFieldName': 'CSLF_ID', 'fieldAliases': {'OBJECTID':
'OBJECTID', 'CSLF_ID': 'CSLF_ID', 'Area_SF': 'Area_SF', 'Pre_Zone':
'Pre_Zone', 'Pre_ZoneST': 'Pre_ZoneST', 'PRE_SRCCIT': 'PRE_SRCCIT',
'NEW_ZONE': 'NEW_ZONE', 'NEW_ZONEST': .... {'attributes': {'OBJECTID':
26, 'CSLF_ID': '13245C_26', 'Area_SF': 5.855231804165408e-05,
'Pre_Zone': 'X', 'Pre_ZoneST': '0.2 PCT ANNUAL CHANCE FLOOD HAZARD',
'PRE_SRCCIT': '13245C_STUDY1', 'NEW_ZONE': 'A', 'NEW_ZONEST': None,
'NEW_SRCCIT': '13245C_STUDY2', 'CHHACHG': 'None (Zero)', 'SFHACHG':
'Increase', 'FLDWYCHG': 'None (Zero)', 'NONSFHACHG': 'Decrease',
'STRUCTURES': None, 'POPULATION': None, 'HUC8_CODE': None, 'CASE_NO':
None, 'VERSION_ID': '2.3.3.3', 'SOURCE_CIT': '13245C_STUDY2', 'CID':
'13245C', 'Pre_BFE': -9999, 'Pre_BFE_LEN_UNIT': None, 'New_BFE':
-9999, 'New_BFE_LEN_UNIT': None, 'BFECHG': 'False', 'ZONECHG': 'True', 'ZONESTCHG': 'True', 'DFIRM_ID': '13245C', 'SHAPE_Length':
0.009178426056888393, 'SHAPE_Area': 4.711699932249018e-07, 'UID': 'f0125a91-2331-4318-9a50-d77d042a48c3'}}, {'attributes': .....}
If your json data is already a dictionary, then take advantage of that. The beauty of a dictionary / hashmap is that it provides an average time complexity of O(1).
Based on your comment, I believe this will solve your problem:
dfirm_id = []
for feature in cslf['features']:
dfirm_id.append(feature['attributes']['DFIRM_ID'])

Postgres JSON data type Rails query

I am using Postgres' json data type but want to do a query/ordering with data that is nested within the json.
I want to order or query with .where on the json data type. For example, I want to query for users that have a follower count > 500 or I want to order by follower or following count.
Thanks!
Example:
model User
data: {
"photos"=>[
{"type"=>"facebook", "type_id"=>"facebook", "type_name"=>"Facebook", "url"=>"facebook.com"}
],
"social_profiles"=>[
{"type"=>"vimeo", "type_id"=>"vimeo", "type_name"=>"Vimeo", "url"=>"http://vimeo.com/", "username"=>"v", "id"=>"1"},
{"bio"=>"I am not a person, but a series of plants", "followers"=>1500, "following"=>240, "type"=>"twitter", "type_id"=>"twitter", "type_name"=>"Twitter", "url"=>"http://www.twitter.com/", "username"=>"123", "id"=>"123"}
]
}
For any who stumbles upon this. I have come up with a list of queries using ActiveRecord and Postgres' JSON data type. Feel free to edit this to make it more clear.
Documentation to the JSON operators used below: https://www.postgresql.org/docs/current/functions-json.html.
# Sort based on the Hstore data:
Post.order("data->'hello' DESC")
=> #<ActiveRecord::Relation [
#<Post id: 4, data: {"hi"=>"23", "hello"=>"22"}>,
#<Post id: 3, data: {"hi"=>"13", "hello"=>"21"}>,
#<Post id: 2, data: {"hi"=>"3", "hello"=>"2"}>,
#<Post id: 1, data: {"hi"=>"2", "hello"=>"1"}>]>
# Where inside a JSON object:
Record.where("data ->> 'likelihood' = '0.89'")
# Example json object:
r.column_data
=> {"data1"=>[1, 2, 3],
"data2"=>"data2-3",
"array"=>[{"hello"=>1}, {"hi"=>2}],
"nest"=>{"nest1"=>"yes"}}
# Nested search:
Record.where("column_data -> 'nest' ->> 'nest1' = 'yes' ")
# Search within array:
Record.where("column_data #>> '{data1,1}' = '2' ")
# Search within a value that's an array:
Record.where("column_data #> '{array,0}' ->> 'hello' = '1' ")
# this only find for one element of the array.
# All elements:
Record.where("column_data ->> 'array' LIKE '%hello%' ") # bad
Record.where("column_data ->> 'array' LIKE ?", "%hello%") # good
According to this http://edgeguides.rubyonrails.org/active_record_postgresql.html#json
there's a difference in using -> and ->>:
# db/migrate/20131220144913_create_events.rb
create_table :events do |t|
t.json 'payload'
end
# app/models/event.rb
class Event < ActiveRecord::Base
end
# Usage
Event.create(payload: { kind: "user_renamed", change: ["jack", "john"]})
event = Event.first
event.payload # => {"kind"=>"user_renamed", "change"=>["jack", "john"]}
## Query based on JSON document
# The -> operator returns the original JSON type (which might be an object), whereas ->> returns text
Event.where("payload->>'kind' = ?", "user_renamed")
So you should try Record.where("data ->> 'status' = 200 ") or the operator that suits your query (http://www.postgresql.org/docs/current/static/functions-json.html).
Your question doesn't seem to correspond to the data you've shown, but if your table is named users and data is a field in that table with JSON like {count:123}, then the query
SELECT * WHERE data->'count' > 500 FROM users
will work. Take a look at your database schema to make sure you understand the layout and check that the query works before complicating it with Rails conventions.
JSON filtering in Rails
Event.create( payload: [{ "name": 'Jack', "age": 12 },
{ "name": 'John', "age": 13 },
{ "name": 'Dohn', "age": 24 }]
Event.where('payload #> ?', '[{"age": 12}]')
#You can also filter by name key
Event.where('payload #> ?', '[{"name": "John"}]')
#You can also filter by {"name":"Jack", "age":12}
Event.where('payload #> ?', {"name":"Jack", "age":12}.to_json)
You can find more about this here