Read JSON response in Python - json

I am trying to read json response from this link. But its not working! I get the following error:
ValueError: No JSON object could be decoded.
Here is the code I've tried:
import urllib2, json
a = urllib2.urlopen('https://www.googleapis.com/pagespeedonline/v3beta1/mobileReady?key=AIzaSyDkEX-f1JNLQLC164SZaobALqFv4PHV-kA&screenshot=true&snapshots=true&locale=en_US&url=https://www.economicalinsurance.com/en/&strategy=mobile&filter_third_party_resources=false&callback=_callbacks_._DElanZU7Xh1K')
data = json.loads(a)
I made these changes:
import requests, json
r=requests.get('https://www.googleapis.com/pagespeedonline/v3beta1/mobileReady?key=AIzaSyDkEX-f1JNLQLC164SZaobALqFv4PHV-kA&screenshot=true&snapshots=true&locale=en_US&url=https://www.economicalinsurance.com/en/&strategy=mobile&filter_third_party_resources=false')
json_data = json.loads(r.text)
print json_data['ruleGroups']['USABILITY']['score']
A Quick question - Construct Image link .
I able to get here : -
from selenium import webdriver
txt = json_data['screenshot']['data']
txt = str(txt).replace('-','/').replace('_','/')
#then in order to construct the image link i tried : -
image_link = 'data:image/jpeg;base64,'+txt
driver = webdriver.Firefox()
driver.get(image_link)
The problem is i am not getting the image, also the len(object_original) as compared len(image_link) differs . Could anybody please advise the right elements missing in my constructed image link ?. Thank you
Here is API link - https://www.google.co.uk/webmasters/tools/mobile-friendly/ Sorry added it late .

Two corrections need to be made to your code:
The url was corrected (as mentioned by Felix Kling here). You have to remove the callback parameter from the GET request you were sending.
Also, if you check the type of the response that you were fetching earlier you'll notice that it wasn't a string. It was <type 'instance'>. And since json.loads() accepts a string as a parameter variable you would've got another error. Therefore, use a.read() to fetch the response data in string.
Hence, this should be your code:
import urllib2, json
a = urllib2.urlopen('https://www.googleapis.com/pagespeedonline/v3beta1/mobileReady?key=AIzaSyDkEX-f1JNLQLC164SZaobALqFv4PHV-kA&screenshot=true&snapshots=true&locale=en_US&url=https://www.economicalinsurance.com/en/&strategy=mobile&filter_third_party_resources=false')
data = json.loads(a.read())
Answer to your second query (regarding the image) is:
from base64 import decodestring
arr = json_data['screenshot']['data']
arr = arr.replace("_", "/")
arr = arr.replace("-","+")
fh = open("imageToSave.jpeg", "wb")
fh.write(str(arr).decode('base64'))
fh.close()
Here, is the image you were trying to fetch - Link

Felix Kling is right about the address, but I also created a variable that holds the URL. You can try this out to and it should work:
import urllib2, json
url = "https://www.googleapis.com/pagespeedonline/v3beta1/mobileReady?key=AIzaSyDkEX-f1JNLQLC164SZaobALqFv4PHV-kA&screenshot=true&snapshots=true&locale=en_US&url=https://www.economicalinsurance.com/en/&strategy=mobile&filter_third_party_resources=false"
response = urllib2.urlopen(url)
data = json.loads(response.read())
print data

Related

How to navigate through a json file with Python 3? TypeError: list indices must be integers or slices, not str

I am trying to get as many profile links as I can on khanacademy.org. I am using their api.
I am struggling navigating through the json file to get the desired data.
Here is my code :
from urllib.request import urlopen
import json
with urlopen("https://www.khanacademy.org/api/internal/discussions/video/what-are-algorithms/questions?casing=camel&limit=10&page=0&sort=1&lang=en&_=190422-1711-072ca2269550_1556031278137") as response:
source = response.read()
data= json.loads(source)
for item in data['feedback']:
print(item['authorKaid'])
profile_answers = item['answers']['authorKaid']
print(profile_answers)
My goal is to get as many authorKaid as possible en then store them (to create a database later).
When I run this code I get this error :
TypeError: list indices must be integers or slices, not str
I don't understand why, on this tutorial video : https://www.youtube.com/watch?v=9N6a-VLBa2I at 16:10 it is working.
the issue is item['answers'] are lists and you are trying to access by a string rather than an index value. So when you try to get item['answers']['authorKaid'] there is the error:
What you really want is
print (item['answers'][0]['authorKaid'])
print (item['answers'][1]['authorKaid'])
print (item['answers'][2]['authorKaid'])
etc...
So you're actually wanting to iterate through those lists. Try this:
from urllib.request import urlopen
import json
with urlopen("https://www.khanacademy.org/api/internal/discussions/video/what-are-algorithms/questions?casing=camel&limit=10&page=0&sort=1&lang=en&_=190422-1711-072ca2269550_1556031278137") as response:
source = response.read()
data= json.loads(source)
for item in data['feedback']:
print(item['authorKaid'])
for each in item['answers']:
profile_answers = each['authorKaid']
print(profile_answers)

Many errors when trying to load and work with json in python

I have tried using .read() and .decode("utf-8") just keep getting errors like this 'TypeError: can only concatenate str (not "dict") to str'
from requests import get
import json
url = 'http://taco-randomizer.herokuapp.com/random/?full-taco=true'
requested_taco = get(url)
requested_taco_data = json.loads(requested_taco.read())
title = requested_taco_data['name']
Thank you in advance to anyone who is able to help me figure out how to get the json to become a dictionary in python.
There is no response.read() in Requests, instead you should use response.json() like so:
taco = requested_taco.json()
print(taco['name'])
which would output:
'Black Bean, Potato, and Onion Tacos'
no need for json library.

Unable to understand and parse the JSON URL response

I have a json url and I am trying to extract data from the response. below is my code
url = urllib2.urlopen("https://i1.adis.ws/s/foo/M0011126_001_SET.js?func=app.mjiProduct.handleJSON&protocol=https")
content = url.read()
soup = BeautifulSoup(content, "html.parser")
print(soup.prettify())
print(soup.items)
newDictionary=json.loads(str(soup))
Below is the response.content
app.mjiProduct.handleJSON({"name":"M0011126_001_SET","items":[{"type":"img","src":"https://i1.adis.ws/i/foo/M0011126_001_MAIN","width":3200,"height":4800,"format":"TIFF","opaque":"true"},{"type":"img","src":"https://i1.adis.ws/i/foo/M0011126_001_ALT1","width":3200,"height":4800,"format":"TIFF","opaque":"true"},{"type":"img","src":"https://i1.adis.ws/i/foo/M0011126_001_ALT2","width":3200,"height":4800,"format":"TIFF","opaque":"true"}]});
I am new to JSON and unable to understand the response. In addition, I need to parse the response in json or in some form to extract image sources. But the above code gives me below error.
No JSON object could be decoded
Can Anyone please guide me ? Thanks
first of all your url isn't working it returns app.mjiProduct.handleJSON({"status":"error","errorMsg":"Failed to get set"});
the second thing is that you don't have to pass the content to Beautifulsoup, you could pass it directly to json like I did in my code bellow without the Beautifulsoup object.
I used httpbin to test but this should work in your url. I used python3 tho
from urllib.request import urlopen
import json
url = urlopen("http://httpbin.org/get")
content = url.read()
newDictionary=json.loads(content)
print(newDictionary)
output: {'args': {}, 'headers': {'Accept-Encoding': 'identity', 'Connection': 'close', 'Host': 'httpbin.org', 'User-Agent': 'Python-urllib/3.6'}, 'origin': '', 'url': 'http://httpbin.org/get'}
Below is the code that worked for me.
json_data=url.read()
purify_data = json_data.split('handleJSON(')[1].split(');')[0]
loaded_json = json.dumps(json_data)
print(loaded_json['items'][0]['src'])
actually, I figured out that json_data was of type string and I was unable to decode because of the format of that string, that was
app.mjiProduct.handleJSON(REQUIRED JSON)
So, first I filtered my string and then loaded it with json and the problem is solved.
The response doesn't contain valid JSON. It looks like a executable code (probably JavaScript). But the part {"name":"M0011126_001_SET","items":[...]} is valid JSON. So if you know for sure that response has always this format you can strip the function call like this:
content = url.read()[26:-2] # Cut first 26 characters and last two
newDictionary=json.loads(str(content))
I don't know much the Beautiful Soup but what I find it's a library for processing HTML files while your response is not HTML so I think you shouldn't use it for it.

Error parsing JSON file in python 3.4

I am trying to load a Json file from a url and parse it on Python3.4 but i get a few errors and I've no idea what they are pointing to. I did verify the json file on the url from jsonlint.com and the file seems fine. The data.read() is returning 'byte' file and i've type casted it. The code is
import urllib.request
import json
inp = input("enter url :")
if len(inp)<1: inp ='http://python-data.dr-chuck.net/comments_42.json'
data=urllib.request.urlopen(inp)
data_str = str(data.read())
print(type(data_str))
parse_data = json.loads(data_str)
print(type(parse_data))
The error that i'm getting is:
The expression str(data.read()) doesn't "cast" your bytes into a string, it just produces a string representation of them. This can be seen if you print data_str: it's a str beginning with b'.
To actually decode the JSON, you need to do data_str = data.read().decode('utf=8')

HTTPResponse' object has no attribute 'decode

I was getting the following error initially when I was trying to run the code below-
Error:-the JSON object must be str, not 'bytes'
import urllib.request
import json
search = '230 boulder lane cottonwood az'
search = search.replace(' ','%20')
places_api_key = 'AIzaSyDou2Q9Doq2q2RWJWncglCIt0kwZ0jcR5c'
url = 'https://maps.googleapis.com/maps/api/place/textsearch/json?query='+search+'&key='+places_api_key
json_obj = urllib.request.urlopen(url)
data = json.load(json_obj)
for item in data ['results']:
print(item['formatted_address'])
print(item['types'])
After making some troubleshooting changes like:-
json_obj = urllib.request.urlopen(url)
obj = json.load(json_obj)
data = json_obj .readall().decode('utf-8')
Error - 'HTTPResponse' object has no attribute 'decode'
I am getting the error above, I have tried multiple posts on stackoverflow nothing seem to work. I have uploaded the entire working code if anyone can get it to work I'll be very grateful. What I don't understand is that why the same thing worked for others and not me.
Thanks!
urllib.request.urlopen returns an HTTPResponse object which cannot be directly json decoded (because it is a bytestream)
So you'll instead want:
# Convert from bytes to text
resp_text = urllib.request.urlopen(url).read().decode('UTF-8')
# Use loads to decode from text
json_obj = json.loads(resp_text)
However, if you print resp_text from your example, you'll notice it is actually xml, so you'll want an xml reader:
resp_text = urllib.request.urlopen(url).read().decode('UTF-8')
(Pdb) print(resp_text)
<?xml version="1.0" encoding="UTF-8"?>
<PlaceSearchResponse>
<status>OK</status>
...
update (python3.6+)
In python3.6+, json.load can take a byte stream (and json.loads can take a byte string)
This is now valid:
json_obj = json.load(urllib.request.urlopen(url))