I am trying to get as many profile links as I can on khanacademy.org. I am using their api.
I am struggling navigating through the json file to get the desired data.
Here is my code :
from urllib.request import urlopen
import json
with urlopen("https://www.khanacademy.org/api/internal/discussions/video/what-are-algorithms/questions?casing=camel&limit=10&page=0&sort=1&lang=en&_=190422-1711-072ca2269550_1556031278137") as response:
source = response.read()
data= json.loads(source)
for item in data['feedback']:
print(item['authorKaid'])
profile_answers = item['answers']['authorKaid']
print(profile_answers)
My goal is to get as many authorKaid as possible en then store them (to create a database later).
When I run this code I get this error :
TypeError: list indices must be integers or slices, not str
I don't understand why, on this tutorial video : https://www.youtube.com/watch?v=9N6a-VLBa2I at 16:10 it is working.
the issue is item['answers'] are lists and you are trying to access by a string rather than an index value. So when you try to get item['answers']['authorKaid'] there is the error:
What you really want is
print (item['answers'][0]['authorKaid'])
print (item['answers'][1]['authorKaid'])
print (item['answers'][2]['authorKaid'])
etc...
So you're actually wanting to iterate through those lists. Try this:
from urllib.request import urlopen
import json
with urlopen("https://www.khanacademy.org/api/internal/discussions/video/what-are-algorithms/questions?casing=camel&limit=10&page=0&sort=1&lang=en&_=190422-1711-072ca2269550_1556031278137") as response:
source = response.read()
data= json.loads(source)
for item in data['feedback']:
print(item['authorKaid'])
for each in item['answers']:
profile_answers = each['authorKaid']
print(profile_answers)
I have tried using .read() and .decode("utf-8") just keep getting errors like this 'TypeError: can only concatenate str (not "dict") to str'
from requests import get
import json
url = 'http://taco-randomizer.herokuapp.com/random/?full-taco=true'
requested_taco = get(url)
requested_taco_data = json.loads(requested_taco.read())
title = requested_taco_data['name']
Thank you in advance to anyone who is able to help me figure out how to get the json to become a dictionary in python.
There is no response.read() in Requests, instead you should use response.json() like so:
taco = requested_taco.json()
print(taco['name'])
which would output:
'Black Bean, Potato, and Onion Tacos'
no need for json library.
I have a json url and I am trying to extract data from the response. below is my code
url = urllib2.urlopen("https://i1.adis.ws/s/foo/M0011126_001_SET.js?func=app.mjiProduct.handleJSON&protocol=https")
content = url.read()
soup = BeautifulSoup(content, "html.parser")
print(soup.prettify())
print(soup.items)
newDictionary=json.loads(str(soup))
Below is the response.content
app.mjiProduct.handleJSON({"name":"M0011126_001_SET","items":[{"type":"img","src":"https://i1.adis.ws/i/foo/M0011126_001_MAIN","width":3200,"height":4800,"format":"TIFF","opaque":"true"},{"type":"img","src":"https://i1.adis.ws/i/foo/M0011126_001_ALT1","width":3200,"height":4800,"format":"TIFF","opaque":"true"},{"type":"img","src":"https://i1.adis.ws/i/foo/M0011126_001_ALT2","width":3200,"height":4800,"format":"TIFF","opaque":"true"}]});
I am new to JSON and unable to understand the response. In addition, I need to parse the response in json or in some form to extract image sources. But the above code gives me below error.
No JSON object could be decoded
Can Anyone please guide me ? Thanks
first of all your url isn't working it returns app.mjiProduct.handleJSON({"status":"error","errorMsg":"Failed to get set"});
the second thing is that you don't have to pass the content to Beautifulsoup, you could pass it directly to json like I did in my code bellow without the Beautifulsoup object.
I used httpbin to test but this should work in your url. I used python3 tho
from urllib.request import urlopen
import json
url = urlopen("http://httpbin.org/get")
content = url.read()
newDictionary=json.loads(content)
print(newDictionary)
output: {'args': {}, 'headers': {'Accept-Encoding': 'identity', 'Connection': 'close', 'Host': 'httpbin.org', 'User-Agent': 'Python-urllib/3.6'}, 'origin': '', 'url': 'http://httpbin.org/get'}
Below is the code that worked for me.
json_data=url.read()
purify_data = json_data.split('handleJSON(')[1].split(');')[0]
loaded_json = json.dumps(json_data)
print(loaded_json['items'][0]['src'])
actually, I figured out that json_data was of type string and I was unable to decode because of the format of that string, that was
app.mjiProduct.handleJSON(REQUIRED JSON)
So, first I filtered my string and then loaded it with json and the problem is solved.
The response doesn't contain valid JSON. It looks like a executable code (probably JavaScript). But the part {"name":"M0011126_001_SET","items":[...]} is valid JSON. So if you know for sure that response has always this format you can strip the function call like this:
content = url.read()[26:-2] # Cut first 26 characters and last two
newDictionary=json.loads(str(content))
I don't know much the Beautiful Soup but what I find it's a library for processing HTML files while your response is not HTML so I think you shouldn't use it for it.
I am trying to load a Json file from a url and parse it on Python3.4 but i get a few errors and I've no idea what they are pointing to. I did verify the json file on the url from jsonlint.com and the file seems fine. The data.read() is returning 'byte' file and i've type casted it. The code is
import urllib.request
import json
inp = input("enter url :")
if len(inp)<1: inp ='http://python-data.dr-chuck.net/comments_42.json'
data=urllib.request.urlopen(inp)
data_str = str(data.read())
print(type(data_str))
parse_data = json.loads(data_str)
print(type(parse_data))
The error that i'm getting is:
The expression str(data.read()) doesn't "cast" your bytes into a string, it just produces a string representation of them. This can be seen if you print data_str: it's a str beginning with b'.
To actually decode the JSON, you need to do data_str = data.read().decode('utf=8')
I was getting the following error initially when I was trying to run the code below-
Error:-the JSON object must be str, not 'bytes'
import urllib.request
import json
search = '230 boulder lane cottonwood az'
search = search.replace(' ','%20')
places_api_key = 'AIzaSyDou2Q9Doq2q2RWJWncglCIt0kwZ0jcR5c'
url = 'https://maps.googleapis.com/maps/api/place/textsearch/json?query='+search+'&key='+places_api_key
json_obj = urllib.request.urlopen(url)
data = json.load(json_obj)
for item in data ['results']:
print(item['formatted_address'])
print(item['types'])
After making some troubleshooting changes like:-
json_obj = urllib.request.urlopen(url)
obj = json.load(json_obj)
data = json_obj .readall().decode('utf-8')
Error - 'HTTPResponse' object has no attribute 'decode'
I am getting the error above, I have tried multiple posts on stackoverflow nothing seem to work. I have uploaded the entire working code if anyone can get it to work I'll be very grateful. What I don't understand is that why the same thing worked for others and not me.
Thanks!
urllib.request.urlopen returns an HTTPResponse object which cannot be directly json decoded (because it is a bytestream)
So you'll instead want:
# Convert from bytes to text
resp_text = urllib.request.urlopen(url).read().decode('UTF-8')
# Use loads to decode from text
json_obj = json.loads(resp_text)
However, if you print resp_text from your example, you'll notice it is actually xml, so you'll want an xml reader:
resp_text = urllib.request.urlopen(url).read().decode('UTF-8')
(Pdb) print(resp_text)
<?xml version="1.0" encoding="UTF-8"?>
<PlaceSearchResponse>
<status>OK</status>
...
update (python3.6+)
In python3.6+, json.load can take a byte stream (and json.loads can take a byte string)
This is now valid:
json_obj = json.load(urllib.request.urlopen(url))