I retrieved a dataset from a news API in JSON format. I want to extract the news description from the JSON data.
This is my code:-
import requests
import json
url = ('http://newsapi.org/v2/top-headlines?'
'country=us&'
'apiKey=608bf565c67f4d99994c08d74db82f54')
response = requests.get(url)
di=response.json()
di = json.dumps(di)
for di['articles'] in di:
print(article['title'])
The dataset looks like this:-
{'status': 'ok',
'totalResults': 38,
'articles': [
{'source':
{'id': 'the-washington-post',
'name': 'The Washington Post'},
'author': 'Derek Hawkins, Marisa Iati',
'title': 'Coronavirus updates: Texas, Florida and Arizona officials say early reopenings fueled an explosion of cases - The Washington Post',
'description': 'Local officials in states with surging coronavirus cases issued dire warnings Sunday about the spread of infections, saying the virus was rapidly outpacing containment efforts.',
'url': 'https://www.washingtonpost.com/nation/2020/07/05/coronavirus-update-us/',
'urlToImage': 'https://www.washingtonpost.com/wp-apps/imrs.php?src=https://arc-anglerfish-washpost-prod-washpost.s3.amazonaws.com/public/K3UMAKF6OMI6VF6BNTYRN77CNQ.jpg&w=1440',
'publishedAt': '2020-07-05T18:32:44Z',
'content': 'Here are some significant developments:\r\n<ul><li>The rolling seven-day average for daily new cases in the United States reached a record high for the 27th day in a row, climbing to 48,606 on Sunday, … [+5333 chars]'}])
Please guide me with this!
There are few corrections needed in your code.. below code should work and i have removed API KEY in answer make sure that you add one before testing
import requests
import json
url = ('http://newsapi.org/v2/top-headlines?'
'country=us&'
'apiKey=<API KEY>')
di=response.json()
#You don't need to dump json that is already in json format
#di = json.dumps(di)
#your loop is not correctly defined, below is correct way to do it
for article in di['articles']:
print(article['title'])
response.json
{'status': 'ok',
'totalResults': 38,
'articles': [
{'source':
{'id': 'the-washington-post',
'name': 'The Washington Post'},
'author': 'Derek Hawkins, Marisa Iati',
'title': 'Coronavirus updates: Texas, Florida and Arizona officials say early reopenings fueled an explosion of cases - The Washington Post',
'description': 'Local officials in states with surging coronavirus cases issued dire warnings Sunday about the spread of infections, saying the virus was rapidly outpacing containment efforts.',
'url': 'https://www.washingtonpost.com/nation/2020/07/05/coronavirus-update-us/',
'urlToImage': 'https://www.washingtonpost.com/wp-apps/imrs.php?src=https://arc-anglerfish-washpost-prod-washpost.s3.amazonaws.com/public/K3UMAKF6OMI6VF6BNTYRN77CNQ.jpg&w=1440',
'publishedAt': '2020-07-05T18:32:44Z',
'content': 'Here are some significant developments:\r\n<ul><li>The rolling seven-day average for daily new cases in the United States reached a record high for the 27th day in a row, climbing to 48,606 on Sunday, … [+5333 chars]'}]}
Code:
di = response.json() # Understand that 'di' is of type 'dictionary', key-value pair
for i in di["articles"]:
print(i["description"])
"articles" is one of the keys of dictionary di, It's corresponding value is of type list. "description" , which you are looking is part of this list (value of "articles"). Further list contains the dictionary (key-value pair).You can access from key - description
Related
I have JSON output that I would like to convert to pandas dataframe. I downloaded from a website via HTTPS and utilizing an API key. thanks much. here is what I coded:
json_data = vehicle_miles_traveled.json()
print(json_data)
{'request': {'command': 'series', 'series_id': 'STEO.MVVMPUS.A'}, 'series': [{'series_id': 'STEO.MVVMPUS.A', 'name': 'Vehicle Miles Traveled, Annual', 'units': 'million miles/day', 'f': 'A', 'description': 'Includes gasoline and diesel fuel vehicles', 'copyright': 'None', 'source': 'U.S. Energy Information Administration (EIA) - Short Term Energy Outlook', 'geography': 'USA', 'start': '1990', 'end': '2023', 'lastHistoricalPeriod': '2021', 'updated': '2022-03-08T12:39:35-0500', 'data': [['2023', 9247.0281671], ['2022', 9092.4575671], ['2021', 8846.1232877], ['2020', 7933.3907104], ['2019', 8936.3589041], ['2018', 8877.6027397], ['2017', 8800.9479452], ['2016', 8673.2431694], ['2015', 8480.4712329], ['2014', 8289.4684932], ['2013', 8187.0712329], ['2012', 8110.8387978], ['2011', 8083.2931507], ['2010', 8129.4958904], ['2009', 8100.7205479], ['2008', 8124.3387978], ['2007', 8300.8794521], ['2006', 8257.8520548], ['2005', 8190.2136986], ['2004', 8100.5163934], ['2003', 7918.4136986], ['2002', 7823.3123288], ['2001', 7659.2054795], ['2000', 7505.2622951], ['1999', 7340.9808219], ['1998', 7192.7780822], ['1997', 7014.7205479], ['1996', 6781.9699454], ['1995', 6637.7369863], ['1994', 6459.1452055], ['1993', 6292.3424658], ['1992', 6139.7595628], ['1991', 5951.2712329], ['1990', 5883.5643836]]}]}
It hugely depends on your final goal. You could add all meta-data in a dataframe if you want to. I assume that you are interested in reading the data field into a dataframe.
We can just get those fields by accessing:
data = json_data['series'][0]['data']
# and pass them to the dataframe constructor. We can specify the column names as well!
df = pd.DataFrame(data, columns=['year', 'other_col_name'])
I'm having a problem with my code, no errors occur but for some reason I'm not getting the desired
outcome
This is the 'json' data that the user will receive
books = [
{'id': 0,
'title': 'A fire Upon the Deep',
'author': 'Vernor Vinge',
'first_sentence': 'The coldsleep itself was dreamless.',
'year_published': '1992'},
{'id': 1,
'title': 'The Ones Who Walk Away From Omelas',
'author': 'Ursula K. Le Guin',
'first_sentence': 'With a clamor of bells that set the swallows soaring, the \
Festival of Summer came to the city Omelas, bright-towered by the sea.',
'published': '1973'},
{'id': 2,
'title': 'Dhalgren',
'author': 'Samuel R. Delany',
'first_sentence': 'to wound the autumnal city.',
'published': '1975'}
]
If no id's exist then it returns an error message which it's not supposed to seeing as each book has an id
#app.route('/api/v1/resource/books', methods=['GET'])
def api_id():
if 'id' in request.args:
id = int(request.args['id'])
else:
return "Error: ID not provided. Please specify an ID"
results = []
for book in books:
if book['id'] == id:
results.append(book)
return jsonify(results)
Perhaps, the request url will be something like http://localhost:5000/api/v1/resource/books/?id=1
For multiple book ids, it will be something like http://localhost:5000/api/v1/resource/books/?id=1&id=2
from flask import Flask, request, jsonify
app = Flask(__name__)
books = [
{'id': 0,
'title': 'A fire Upon the Deep',
'author': 'Vernor Vinge',
'first_sentence': 'The coldsleep itself was dreamless.',
'year_published': '1992'},
{'id': 1,
'title': 'The Ones Who Walk Away From Omelas',
'author': 'Ursula K. Le Guin',
'first_sentence': 'With a clamor of bells that set the swallows soaring, the \
Festival of Summer came to the city Omelas, bright-towered by the sea.',
'published': '1973'},
{'id': 2,
'title': 'Dhalgren',
'author': 'Samuel R. Delany',
'first_sentence': 'to wound the autumnal city.',
'published': '1975'}
]
#app.route("/api/v1/resource/books/", methods=["GET"])
def api_id():
# This will look for 'id' in the url. If there's no id/ids, it will take None as default value
bids = request.args.getlist('id')
# if you are expecting to serve only one id at a time, use >> request.args.get('id')
if bids:
results = []
for book in books:
for bid in bids: # this for loop won't be required for strictly single id
if book['id'] == int(bid):
results.append(book)
return jsonify(results)
else:
return "Error: ID not provided. Please specify an ID"
if __name__ == "__main__":
app.run()
This code is working for me. I hope it does for you as well.
Request Url:
http://localhost:5000/api/v1/resource/books/?id=0&id=2
Output:
[{"author":"Vernor Vinge","first_sentence":"The coldsleep itself was dreamless.","id":0,"title":"A fire Upon the Deep","year_published":"1992"},{"author":"Samuel R. Delany","first_sentence":"to wound the autumnal city.","id":2,"published":"1975","title":"Dhalgren"}]
Request Url:
http://localhost:5000/api/v1/resource/books/?id=0
Output:
[{"author":"Vernor Vinge","first_sentence":"The coldsleep itself was dreamless.","id":0,"title":"A fire Upon the Deep","year_published":"1992"}]
Request Url:
http://localhost:5000/api/v1/resource/books/
Output:
Error: ID not provided. Please specify an ID
I have a dataframe with a column containing JSON in the format, where one record looks like -
player_feedback
{'player': '1b87a117-09ef-41e2-8710-6bc144760a74',
'feedback': [{'answer': [{'id': '1-6gaincareerinfo', 'content': 'To gain career information'},
{'id': '1-5proveskills', 'content': 'Opportunity to prove skills by competing '},
{'id': '1-1diff', 'content': 'Try something different'}], 'question': 1},
{'answer': [{'id': '2-2skilldev', 'content': 'Skill development'}], 'question': 2},
{'answer': [{'id': '3-6exploit', 'content': 'Exploitation'},
{'id': '3-1forensics', 'content': 'Forensics'}], 'question': 3},
{'answer': 'verygood', 'question': 4},
{'answer': 'poor', 'question': 5}, ... ... ,
{'answer': 'verygood', 'question': 15}]}
Here are the first 5 rows of the data.
I want to convert this column to separate columns like -
player Question 1 Question 2 ... Question 15
1b87a117-09ef-41e2-8710-6bc144760a74 To gain career information, Skill development verygood
Opportunity to prove skills by competing,
Try something different
I started with -
df_survey_responses['player_feedback'].apply(ast.literal_eval).values.tolist()
but that only gets me the player id in a seperate field and the feedback in another. As far as I can tell, JSONNormalize would also give me similar result. How can I do this recursively to get my desired result, or is a better way to do this?
Thanks!
You can use a json flattener to like this one:
def flatten_json(nested_json):
"""
Flatten json object with nested keys into a single level.
Args:
nested_json: A nested json object.
Returns:
The flattened json object if successful, None otherwise.
"""
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(nested_json)
return out
Which gives dataframes that look like this:
0
player 34a8eb8a-056f-4568-88dc-8736056819a3
feedback_0_answer_0_id 1-5proveskills
feedback_0_answer_0_content Opportunity to prove skills by competing
feedback_0_question 1
feedback_1_answer_0_id 2-1networking
feedback_1_answer_0_content Networking
feedback_1_answer_1_id 2-2skilldev
feedback_1_answer_1_content Skill development
feedback_1_question 2
feedback_2_answer_0_id 3-5boottoroot
feedback_2_answer_0_content Boot2root
feedback_2_answer_1_id 3-6exploit
feedback_2_answer_1_content Exploitation
feedback_2_question 3
feedback_3_answer good
feedback_3_question 4
feedback_4_answer good
feedback_4_question 5
feedback_5_answer selfchose
feedback_5_question 6
feedback_6_answer pairs
feedback_6_question 7
feedback_7_answer_0_id 7-persistence
feedback_7_answer_0_content Persistence
feedback_7_question 8
feedback_8_answer social
feedback_8_question 9
feedback_9_answer training
feedback_9_question 10
feedback_10_answer yes
feedback_10_question 11
feedback_11_answer yes
feedback_11_question 12
feedback_12_answer yes
feedback_12_question 13
feedback_13_answer yes
feedback_13_question 14
feedback_14_answer verygood
feedback_14_question 15
feedback_15_answer yes
feedback_15_question 16
feedback_16_answer yes
feedback_16_question 17
feedback_17_answer It would be good to have more exploitation one...
feedback_17_question 18
I have this dictionary (or so type() tells me):
{'uploadedby': 'fred',
'return_url': '',
'id': '2200',
'question_json': '{"ops":[{"insert":"What metal is responsible for a Vulcan\'s green blood?\\n"}]}'}
When I use json.dumps on it, I get this:
{"uploadedby": "fred",
"return_url": "",
"id": "2200",
"question_json": "{\"ops\":[{\"insert\":\"What metal is responsible for a Vulcan's green blood?\\n\"}]}", "question": "What metal is responsible for a Vulcan's green blood?\r\n"}
I don't want all the escaping that's going on. Is there something I can do to correct this?
You can do something like the following to convert question_json into a python dict, and then dump the entire dict:
test = {'uploadedby': 'fred',
'return_url': '',
'id': '2200',
'question_json': '{"ops":[{"insert":"What metal is responsible for a Vulcan\'s green blood?\\n"}]}'}
json.dumps(
{k: json.loads(v) if k == 'question_json' else v for k,v in test.items()}
)
'{"question_json": {"ops": [{"insert": "What metal is responsible for a Vulcan\'s green blood?\\n"}]}, "uploadedby": "fred", "return_url": "", "id": "2200"}'
You could try the following, which has the added benefit of not needing to specify which key contains the offending value. Here we're checking to see if we can effectively load a JSON string from any of the key-value pairs and leaving them alone if that fails.
import json
mydict = {'uploadedby': 'fred',
'return_url': '',
'id': '2200',
'question_json': '{"ops":[{"insert":"What metal is responsible for a Vulcan\'s green blood?\\n"}]}'}
for key in mydict:
try:
mydict[key] = json.loads(mydict[key])
except:
pass
Now when we do a json.dumps(mydict), the offending key is fixed and others are as they were:
{'uploadedby': 'fred',
'return_url': '',
'id': 2200,
'question_json': {'ops': [{'insert': "What metal is responsible for a Vulcan's green blood?\n"}]}}
Note that the id value has been converted to an int, which may or may not be your intent. It's hard to tell from the original question.
I'm new to programming and am trying to parse some data returned from Yelp's API. From this data, how could I return something like just the phone number (display_phone) and address? Thank you
Result for business "little-miss-bbq-phoenix-2" found:
{ u'categories': [[u'Barbeque', u'bbq']],
u'display_phone': u'+1-602-437-1177',
u'id': u'little-miss-bbq-phoenix-2',
u'image_url': u'http://s3-media2.fl.yelpcdn.com/bphoto/4Rcm0IIbRhdo-4Z4KPvuXQ/ms.jpg',
u'is_claimed': True,
u'is_closed': False,
u'location': { u'address': [u'4301 E University Dr'],
u'city': u'Phoenix',
u'coordinate': { u'latitude': 33.421587,
u'longitude': -111.989088},
u'country_code': u'US',
u'display_address': [ u'4301 E University Dr',
u'Phoenix, AZ 85034'],
u'geo_accuracy': 9.5,
u'postal_code': u'85034',
u'state_code': u'AZ'},
u'mobile_url': u'http://m.yelp.com/biz/little-miss-bbq-phoenix-2',
u'name': u'Little Miss BBQ',
u'phone': u'6024371177',
u'rating': 5.0,
u'rating_img_url': u'http://s3-media1.fl.yelpcdn.com/assets/2/www/img/f1def11e4e79/ico/stars/v1/stars_5.png',
u'rating_img_url_large': u'http://s3-media3.fl.yelpcdn.com/assets/2/www/img/22affc4e6c38/ico/stars/v1/stars_large_5.png',
u'rating_img_url_small': u'http://s3-media1.fl.yelpcdn.com/assets/2/www/img/c7623205d5cd/ico/stars/v1/stars_small_5.png',
u'review_count': 403,
u'reviews': [ { u'excerpt': u"I saw that this place had almost 400 reviews and that they have a perfect 5 star rating. It sounded too good to be true BUT it's worth every star and...",
u'id': u'-9poa0ycpVnOveVlqbYE9Q',
u'rating': 5,
u'rating_image_large_url': u'http://s3-media3.fl.yelpcdn.com/assets/2/www/img/22affc4e6c38/ico/stars/v1/stars_large_5.png',
u'rating_image_small_url': u'http://s3-media1.fl.yelpcdn.com/assets/2/www/img/c7623205d5cd/ico/stars/v1/stars_small_5.png',
u'rating_image_url': u'http://s3-media1.fl.yelpcdn.com/assets/2/www/img/f1def11e4e79/ico/stars/v1/stars_5.png',
u'time_created': 1431095420,
u'user': { u'id': u'd43iQ50HjWIl4vN4rBgoVQ',
u'image_url': u'http://s3-media4.fl.yelpcdn.com/photo/sn17KBbRjXOEELnjSir1tg/ms.jpg',
u'name': u'Jason J.'}}],
u'snippet_image_url': u'http://s3-media4.fl.yelpcdn.com/photo/sn17KBbRjXOEELnjSir1tg/ms.jpg',
u'snippet_text': u"I saw that this place had almost 400 reviews and that they have a perfect 5 star rating. It sounded too good to be true BUT it's worth every star and...",
u'url': u'http://www.yelp.com/biz/little-miss-bbq-phoenix-2'}