Unable to understand and parse the JSON URL response - json

I have a json url and I am trying to extract data from the response. below is my code
url = urllib2.urlopen("https://i1.adis.ws/s/foo/M0011126_001_SET.js?func=app.mjiProduct.handleJSON&protocol=https")
content = url.read()
soup = BeautifulSoup(content, "html.parser")
print(soup.prettify())
print(soup.items)
newDictionary=json.loads(str(soup))
Below is the response.content
app.mjiProduct.handleJSON({"name":"M0011126_001_SET","items":[{"type":"img","src":"https://i1.adis.ws/i/foo/M0011126_001_MAIN","width":3200,"height":4800,"format":"TIFF","opaque":"true"},{"type":"img","src":"https://i1.adis.ws/i/foo/M0011126_001_ALT1","width":3200,"height":4800,"format":"TIFF","opaque":"true"},{"type":"img","src":"https://i1.adis.ws/i/foo/M0011126_001_ALT2","width":3200,"height":4800,"format":"TIFF","opaque":"true"}]});
I am new to JSON and unable to understand the response. In addition, I need to parse the response in json or in some form to extract image sources. But the above code gives me below error.
No JSON object could be decoded
Can Anyone please guide me ? Thanks

first of all your url isn't working it returns app.mjiProduct.handleJSON({"status":"error","errorMsg":"Failed to get set"});
the second thing is that you don't have to pass the content to Beautifulsoup, you could pass it directly to json like I did in my code bellow without the Beautifulsoup object.
I used httpbin to test but this should work in your url. I used python3 tho
from urllib.request import urlopen
import json
url = urlopen("http://httpbin.org/get")
content = url.read()
newDictionary=json.loads(content)
print(newDictionary)
output: {'args': {}, 'headers': {'Accept-Encoding': 'identity', 'Connection': 'close', 'Host': 'httpbin.org', 'User-Agent': 'Python-urllib/3.6'}, 'origin': '', 'url': 'http://httpbin.org/get'}

Below is the code that worked for me.
json_data=url.read()
purify_data = json_data.split('handleJSON(')[1].split(');')[0]
loaded_json = json.dumps(json_data)
print(loaded_json['items'][0]['src'])
actually, I figured out that json_data was of type string and I was unable to decode because of the format of that string, that was
app.mjiProduct.handleJSON(REQUIRED JSON)
So, first I filtered my string and then loaded it with json and the problem is solved.

The response doesn't contain valid JSON. It looks like a executable code (probably JavaScript). But the part {"name":"M0011126_001_SET","items":[...]} is valid JSON. So if you know for sure that response has always this format you can strip the function call like this:
content = url.read()[26:-2] # Cut first 26 characters and last two
newDictionary=json.loads(str(content))
I don't know much the Beautiful Soup but what I find it's a library for processing HTML files while your response is not HTML so I think you shouldn't use it for it.

Related

Attempting to parse a JSON file with Python

So I've been beating my head against a wall for days now and have been diving down the google/SO rabbit hole in search of answers. I've been debating on how to phrase this question as the API that I am pulling from, may or may not contain some sensitive information that gets uncomfortably close to HIPPA laws for my liking. For that reason I will not be providing the direct link/auth for the my code. That being said I will be providing a made up JSON script to help with the explaining.
import requests
import json
import urllib3
r = requests.get('https://madeup.url.com/api/vi/information here', auth=('123456789', '1111111111222222222223333333333444444455555555'))
payload = {'query': 'firstName'}
response = requests.get(r, params=payload)
json_response = response.json()
print(json.dumps(json_response))
The JSON file that I'm trying to parse looks in part like this:
"{\"id\": 123456789, \"firstName\": \"NAME\", \"lastName\": \"NAME\", \"phone\": \"NUMBER\", \"email\": \"EMAIL#gmail.com\", \"date\": \"December 16, 2021\", \"time\": \"9:50am\", \"endTime\": \"10:00am\",.....
When I run the code I am getting a "urllib3.exceptions.LocationParseError: Failed to parse: <Response [200]>" traceback and I can not for the life of me figure out what is going on. urllib3 is installed and updated according to the console.
Any help would be much appreciated. TIA
That is not a JSON file. It is a string containing escaped characters. It needs to be unescaped before parsing can work.
youre passing r to requests.get() (line 9) , but r is a response to another requests.get() (line 5)... shouldn't you be passing params=payload in line 5? then getting de response from there, in one single request
import requests
import json
import urllib3
payload = {'query': 'firstName'}
response = requests.get('{YOUR_URL}', auth=('{USER}', '{PASS}'), params=payload)
json_response = response.json()
print(json.dumps(json_response))
That is not a JSON file. It is a string containing escaped characters. It needs to be unescaped before parsing can work.
Well now I'm even more confused. I'm trying to self teach myself python and clearly struggling. To get the "JSON" I posted I used the following code:
r = requests.get('URL', 'auth = ('user', 'pass'))
Data = r.json()
packages_str = json.dumps(Data[0])
with open('Data.json', 'w') as f:
json.dump(packages_str, f)
So basically I'm even more lost now...
Okay, update: Good news! kinda... so my code now reads as follows;
import requests
import json
import urllib3
payload = {
'query1'= 'firstName',
'query2'= 'lastName'
}
response = requests.get("url", auth= ("user","pass"), params=payload)
Data = response.json()
packages_str = json.dumps(Data, ensure_ascii=False, indent=2)
with open('Data.json), 'w') as f:
json.dump(packages_str,f)
f.write(packages_str)
And when I then open the JOSN file, the first line of is the entire API in a string but below that, is a properly formatted JSON file. Unfortunately its the entire API and not a parsed JSON file looking for the the information That I need...
Continuing down the google/youtube/SO rabbit hole and will update at a later date if i find a work around.

how to write r.headers from different urls into one json?

I would like to crawl several urls, while using the requests library in python. I am scrutinizing the GET requests as well as the response headers. However, when crawling and getting the data from different urls I am facing the problem, that I don't know all 'key:values', which are coming in. Thus writing those data to a valid csv file is not really possible, in my point of view. Therefore I want to write the data into a json file.
The problem is similar to the following thread from 2014, but not the same:
Get a header with Python and convert in JSON (requests - urllib2 - json)
import requests, json
urls = ['http://www.example.com/', 'http://github.com']
with open('test.json', 'w') as f:
for url in urls:
r = requests.get(url)
rh = r.headers
f.write(json.dumps(dict(rh), sort_keys=True, separators=(',', ':'), indent=4))
I expect a json file, with the headers for each URL. I get a Json file with those data, but my IDE (PyCHarm) is showing an Error, which states out that
JSON standard allows only one top-level value. I have read the documentation:https://docs.python.org/3/library/json.html#repeated-names-within-an-object; but did not get it. Any hint would be appreciated.
EDIT: The only thing which is missing in the outcome is another comma. But where do I enter it and what command do I need for this?
You need to add it to an array and then finally do the json dump to a file. This will work.
urls = ['http://www.example.com/', 'http://github.com']
headers = []
for url in urls:
r = requests.get(url)
header_dict = dict(r.headers)
header_dict['source_url'] = url
headers.append(header_dict)
with open('test.json', 'w', encoding='utf-8') as f:
json.dump(headers, f, sort_keys=True, separators=(',', ':'), indent=4)
You still can write it to a csv:
import pandas as pd
df = pd.DataFrame(headers)
df.to_csv('test.csv')

How do I get Python requests.get "application/json" from a HTML page?

Is there any way to get the JSON Code from a HTML Website? If I use a code like those:
r = requests.get(url)
if r.status_code == 200:
r.json()
result = json.loads(r)
I will always have an error at HTML pages. What modules should I use for getting HTML pages to an Python-dictionary?
You only have one error in your code.
Once you did
r.json()
You didn't assign it to anything. To correct this problem just change your previous line with the line below and you should be good :).
r = r.json()
Not all webpages responds with JSON data. But you can use json.loads to print data in json string. You can also use r.contents or r.text to know the type of data coming from webpage. Most of the time it will be just HTML Content
import requests
import json
r = requests.get('http://www.google.com')
# you can use r.content to print the webpage data
print r.content
# json.loads(data) `json_loads` is to convert data into `json string`
print json.loads(r.content)
json.loads will go into ValueError if the data cannot be decoded into JSON Object

Read JSON response in Python

I am trying to read json response from this link. But its not working! I get the following error:
ValueError: No JSON object could be decoded.
Here is the code I've tried:
import urllib2, json
a = urllib2.urlopen('https://www.googleapis.com/pagespeedonline/v3beta1/mobileReady?key=AIzaSyDkEX-f1JNLQLC164SZaobALqFv4PHV-kA&screenshot=true&snapshots=true&locale=en_US&url=https://www.economicalinsurance.com/en/&strategy=mobile&filter_third_party_resources=false&callback=_callbacks_._DElanZU7Xh1K')
data = json.loads(a)
I made these changes:
import requests, json
r=requests.get('https://www.googleapis.com/pagespeedonline/v3beta1/mobileReady?key=AIzaSyDkEX-f1JNLQLC164SZaobALqFv4PHV-kA&screenshot=true&snapshots=true&locale=en_US&url=https://www.economicalinsurance.com/en/&strategy=mobile&filter_third_party_resources=false')
json_data = json.loads(r.text)
print json_data['ruleGroups']['USABILITY']['score']
A Quick question - Construct Image link .
I able to get here : -
from selenium import webdriver
txt = json_data['screenshot']['data']
txt = str(txt).replace('-','/').replace('_','/')
#then in order to construct the image link i tried : -
image_link = 'data:image/jpeg;base64,'+txt
driver = webdriver.Firefox()
driver.get(image_link)
The problem is i am not getting the image, also the len(object_original) as compared len(image_link) differs . Could anybody please advise the right elements missing in my constructed image link ?. Thank you
Here is API link - https://www.google.co.uk/webmasters/tools/mobile-friendly/ Sorry added it late .
Two corrections need to be made to your code:
The url was corrected (as mentioned by Felix Kling here). You have to remove the callback parameter from the GET request you were sending.
Also, if you check the type of the response that you were fetching earlier you'll notice that it wasn't a string. It was <type 'instance'>. And since json.loads() accepts a string as a parameter variable you would've got another error. Therefore, use a.read() to fetch the response data in string.
Hence, this should be your code:
import urllib2, json
a = urllib2.urlopen('https://www.googleapis.com/pagespeedonline/v3beta1/mobileReady?key=AIzaSyDkEX-f1JNLQLC164SZaobALqFv4PHV-kA&screenshot=true&snapshots=true&locale=en_US&url=https://www.economicalinsurance.com/en/&strategy=mobile&filter_third_party_resources=false')
data = json.loads(a.read())
Answer to your second query (regarding the image) is:
from base64 import decodestring
arr = json_data['screenshot']['data']
arr = arr.replace("_", "/")
arr = arr.replace("-","+")
fh = open("imageToSave.jpeg", "wb")
fh.write(str(arr).decode('base64'))
fh.close()
Here, is the image you were trying to fetch - Link
Felix Kling is right about the address, but I also created a variable that holds the URL. You can try this out to and it should work:
import urllib2, json
url = "https://www.googleapis.com/pagespeedonline/v3beta1/mobileReady?key=AIzaSyDkEX-f1JNLQLC164SZaobALqFv4PHV-kA&screenshot=true&snapshots=true&locale=en_US&url=https://www.economicalinsurance.com/en/&strategy=mobile&filter_third_party_resources=false"
response = urllib2.urlopen(url)
data = json.loads(response.read())
print data

Django request Post json

I try to test a view, I receive a json request from the IPad, the format is:
req = {"custom_decks": [
{
"deck_name": "deck_test",
"updates_last_applied": "1406217357",
"created_date": 1406217380,
"slide_section_ids": [
1
],
"deck_id": 1
}
],
"custom_decks_to_delete": []
}
I checked this in jsonlint and it passed.
I post the req via:
response = self.client.post('/library/api/6.0/user/'+ uuid +
'/store_custom_dec/',content_type='application/json', data=req)
The view return "creation_success": false
The problem is the post method in view doesn't find the key custom_decks.
QueryDict: {u'{"custom_decks": [{"deck_id": 1, "slide_section_ids": [1],
"created_date":1406217380, "deck_name": "deck_test"}],
"custom_decks_to_delete": []}': [u'']}>
The problem is the post method in view doesn't find the key custom_decks.
Because it is converting my dict to QueryDict with one key.
I appreciate all helps.
Thanks
You're posting JSON, which is not the same as form-encoded data. You need to get the value of request.body and deserialize it:
data = json.loads(request.body)
custom_decks = data['custom_decks']
As I was having problems with getting JSON data from HttpRequest directly with the code of the other answer:
data = json.loads(request.body)
custom_decks = data['custom_decks']
error:
the JSON object must be str, not 'bytes'
Here is an update of the other answer for Python version >3:
json_str=((request.body).decode('utf-8'))
json_obj=json.loads(json_str)
Regarding decode('utf-8'), as mention in:
RFC 4627:
"JSON text shall be encoded in Unicode. The default encoding is
UTF-8."
I attached the Python link referred to this specific problem for version >3.
http://bugs.python.org/issue10976
python 3.6 and django 2.0 :
post_json = json.loads(request.body)
custom_decks = post_json.get("custom_decks")
json.loads(s, *, encoding=None,...)
Changed in version 3.6: s can now be of type bytes or bytearray. The input encoding should be UTF-8, UTF-16 or UTF-32.
From python 3.6 NO need request.body.decode('utf-8') .
Since HttpRequest has a read() method loading JSON from request is actually as simple as:
def post(self, request, *args, **kwargs):
import json
data = json.load(request)
return JsonResponse(data=data)
If you put this up as a view, you can test it and it'll echo any JSON you send back to you.