Parsing HTTP Response in Python - json

I want to manipulate the information at THIS url. I can successfully open it and read its contents. But what I really want to do is throw out all the stuff I don't want, and to manipulate the stuff I want to keep.
Is there a way to convert the string into a dict so I can iterate over it? Or do I just have to parse it as is (str type)?
from urllib.request import urlopen
url = 'http://www.quandl.com/api/v1/datasets/FRED/GDP.json'
response = urlopen(url)
print(response.read()) # returns string with info

When I printed response.read() I noticed that b was preprended to the string (e.g. b'{"a":1,..). The "b" stands for bytes and serves as a declaration for the type of the object you're handling. Since, I knew that a string could be converted to a dict by using json.loads('string'), I just had to convert the byte type to a string type. I did this by decoding the response to utf-8 decode('utf-8'). Once it was in a string type my problem was solved and I was easily able to iterate over the dict.
I don't know if this is the fastest or most 'pythonic' way of writing this but it works and theres always time later of optimization and improvement! Full code for my solution:
from urllib.request import urlopen
import json
# Get the dataset
url = 'http://www.quandl.com/api/v1/datasets/FRED/GDP.json'
response = urlopen(url)
# Convert bytes to string type and string type to dict
string = response.read().decode('utf-8')
json_obj = json.loads(string)
print(json_obj['source_name']) # prints the string with 'source_name' key

You can also use python's requests library instead.
import requests
url = 'http://www.quandl.com/api/v1/datasets/FRED/GDP.json'
response = requests.get(url)
dict = response.json()
Now you can manipulate the "dict" like a python dictionary.

json works with Unicode text in Python 3 (JSON format itself is defined only in terms of Unicode text) and therefore you need to decode bytes received in HTTP response. r.headers.get_content_charset('utf-8') gets your the character encoding:
#!/usr/bin/env python3
import io
import json
from urllib.request import urlopen
with urlopen('https://httpbin.org/get') as r, \
io.TextIOWrapper(r, encoding=r.headers.get_content_charset('utf-8')) as file:
result = json.load(file)
print(result['headers']['User-Agent'])
It is not necessary to use io.TextIOWrapper here:
#!/usr/bin/env python3
import json
from urllib.request import urlopen
with urlopen('https://httpbin.org/get') as r:
result = json.loads(r.read().decode(r.headers.get_content_charset('utf-8')))
print(result['headers']['User-Agent'])

TL&DR: When you typically get data from a server, it is sent in bytes. The rationale is that these bytes will need to be 'decoded' by the recipient, who should know how to use the data. You should decode the binary upon arrival to not get 'b' (bytes) but instead a string.
Use case:
import requests
def get_data_from_url(url):
response = requests.get(url_to_visit)
response_data_split_by_line = response.content.decode('utf-8').splitlines()
return response_data_split_by_line
In this example, I decode the content that I received into UTF-8. For my purposes, I then split it by line, so I can loop through each line with a for loop.

I guess things have changed in python 3.4. This worked for me:
print("resp:" + json.dumps(resp.json()))

Related

TypeError: Object of type bytes is not JSON serializable - python 3 - try to post base64 image data

i received this error after try convert data to json to post request
TypeError: Object of type 'bytes' is not JSON serializable
my code
dict_data: dict = {
'img': base64.b64encode(urlopen(obj['recognition_image_path']).read())
}
json_data: str = json.dumps(dict_data)
i read image from url, convert it to base64, after i received error when try convert data to json.
Please help
You need to convert to string first by calling .decode, since you can't JSON-serialize a bytes without knowing its encoding.
(base64.b64encode returns a bytes, not a string.)
import base64
from urllib.request import urlopen
import json
dict_data: dict = {
'img': base64.b64encode(urlopen(obj['recognition_image_path']).read()).decode('utf8')
}
json_data: str = json.dumps(dict_data)
edit: rewrite answer to address actual question, encode/decode
I will do it in a two step process:
First encode the image file into BASE64
Then decode the encoded file
And then transmit back the JSON data using the decoded file.
Here is an example:
Let's say the image file is is_image_file
Encode the image file by:
enc_image_file = base64.b64encode(is_image_file.read())
Next decode it by:
send_image_file = enc_image_file.decode()
Finally transmit the data using send_image_file as JsonResponse to wherever it would be used.
Of course, add import base64 before calling the function.
Note: Using json.dumps(dict_data) one gets a string which will not load the image/s.

Python: Build DataFrame from parts of JSON response

I am trying to develop an application to retrieve stock prices (in JSON) and then do some analysis on them. My problem is with getting the JSON response into a pandas DataFrame where I can work. Here is my code:
'''
References
http://stackoverflow.com/questions/6862770/python-3-let-json-object- accept-bytes-or-let-urlopen-output-strings
'''
import json
import pandas as pd
from urllib.request import urlopen
#set API call
url = "https://www.quandl.com/api/v3/datasets/WIKI/AAPL.json?start_date=2017-01-01&end_date=2017-01-31"
#make call and receive response
response = urlopen(url).read().decode('utf8')
dataresponse = json.loads(response)
#check incoming
#print(dataresponse)
df = pd.read_json(dataresponse)
print(df)
The application errors at df = pd.read_json... with error TypeError: Expected String or Unicode.
So I reckon this is the first hurdle.
The second is getting where I need to. The JSON response contains only two arrays I am interested in, column_names and data. How do I extract only these two and put into a pandas DataFrame?
To answer your first question, pd.read_json takes a JSON string directly, so you should be doing this:
pd.read_json(response)
But instead, considering how the data is structured, it's best to first convert the JSON string to a dictionary containing the data:
d = json.loads(response)
Then simply build the dataframe from d['dataset']['data'] and d['dataset']['column_names']:
pd.DataFrame(data=d['dataset']['data'], columns=d['dataset']['column_names'])

Python3 PUT string, not bytes

I am trying to get Python3 to PUT json info in string format to an API. And I want to do it without
import requests
Thus far I am stuck with this code:
import urllib.request
import urllib.parse
import json
url = "http://www.example.com"
DATA = json.dumps({'grades': {"math": "92", "chem": "39"}})
req = urllib.request.Request(url, data=DATA, method='PUT')
response = urllib.request.urlopen(req)
Naturally this raises the error:
raise TypeError(msg)
TypeError: POST data should be bytes or an iterable of bytes. It cannot be of type str.
To get rid of the error I can do:
DATA= str.encode(DATA)
But this turns my data into bytes format, instead of string that I want to put up. Is there a way to PUT up strings without importing "requests"?(importing anything that comes with python install is OK). Or can I PUT up a *.json file?
Essentially I am trying to do the opposite of this.

Error parsing JSON file in python 3.4

I am trying to load a Json file from a url and parse it on Python3.4 but i get a few errors and I've no idea what they are pointing to. I did verify the json file on the url from jsonlint.com and the file seems fine. The data.read() is returning 'byte' file and i've type casted it. The code is
import urllib.request
import json
inp = input("enter url :")
if len(inp)<1: inp ='http://python-data.dr-chuck.net/comments_42.json'
data=urllib.request.urlopen(inp)
data_str = str(data.read())
print(type(data_str))
parse_data = json.loads(data_str)
print(type(parse_data))
The error that i'm getting is:
The expression str(data.read()) doesn't "cast" your bytes into a string, it just produces a string representation of them. This can be seen if you print data_str: it's a str beginning with b'.
To actually decode the JSON, you need to do data_str = data.read().decode('utf=8')

Handling application/json data with bottle

I'm trying to write a simple server frontend to a python3 application, using a restful JSON-based protocol. So far, bottle seems the best suited framework for the task (it supports python3, handles method dispatching in a nice way, and easily returns JSON.) The problem is parsing the JSON in the input request.
The documentation only mention request.fields and request.files, both I assume refer to multipart/form-data data. No mention of accessing the request data directly.
Peeking at the source code, I can see a request.body object of type BytesIO. json.load refuses to act on it directly, dying in the json lib with can't use a string pattern on a bytes-like object. The proper way to do it may be to first decode the bytes to unicode characters, according to whichever charset was specified in the Content-Type HTTP header. I don't know how to do that; I can see a StringIO class and assume it may hold a buffer of characters instead of bytes, but see no way of decoding a BytesIO to a StringIO, if this is even possible at all.
Of course, it may also be possible to read the BytesIO object into a bytestring, then decode it into a string before passing it to the JSON decoder, but if I understand correctly, that breaks the nice buffering behavior of the whole thing.
Or is there any better way to do it ?
It seems that io.TextIOWrapper from the standard library does the trick !
def parse(request):
encoding = ... #get encoding from headers
return json.load(TextIOWrapper(request.body, encoding=encoding))
Here's what I do to read in json on a RESTful service with Python3 and Bottle:
import bson.json_util as bson_json
#app.post('/location/API')
def post_json_example():
"""
param: _id, value
return: I usually return something like {"status": "successful", "message": "discription"}
"""
query_string = bottle.request.query.json
query_dict = bson_json.loads(query_string)
_id = query_dict['_id']
value = query_dict['value']
Then to Test
from python3 interpreter, import requests
s = request.Session()
r = s.post('http://youserver.com:8080/location/API?json
{"_id":"540a16663dafb492a0a7626c","value":"test"}')
use r.text to verify what was returned.
I wrote an helper to use the good idea of b0fh.
After 2 weeks on response.json analyzing, I connect to StackOver Flow and understand that we need a work around
Here is:
def json_app_rqt():
# about request
request.accept = 'application/json, text/plain; charset=utf-8'
def json_app_resp():
# about response
response.headers['Access-Control-Allow-Origin'] = _allow_origin
response.headers['Access-Control-Allow-Methods'] = _allow_methods
# response.headers['Access-Control-Allow-Headers'] = _allow_headers
response.headers['Content-Type'] = 'application/json; charset=utf-8'
def json_app():
json_app_rqt()
json_app_resp()
def get_json_request(rqt):
with TextIOWrapper(rqt.body, encoding = "UTF-8") as json_wrap:
json_text = ''.join(json_wrap.readlines())
json_data = json.loads(json_text)
return json_data
For the using, we cand do:
if __name__ == "__main__":
json_app()
#post("/train_control/:control")
def do_train_control(control):
json_app_resp()
data = get_json_request(request)
print(json.dumps(data))
return data
Thanks to all