Python HDFS : Cannot parse json document - json

I am following the simple piece of code from the documentation
http://hdfscli.readthedocs.org/en/latest/quickstart.html
with client.read(path, encoding='utf-8') as reader:
print reader
from json import load
model = load(reader)
the path is valid. i get
<requests.packages.urllib3.response.HTTPResponse object at 0x0000000003148048>
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: invalid start byte
the first line is the result of print reader. Why am i getting this error? is there any other way to load json object from hdfs? I know that the object is JSON as thats how i had put it in. Is there a way to ignore the error? Why doesn't the encoding work?

Related

beginner help wth python

I am trying my first steps with python and json and hope you can halp me with this.
I am getting a json in a variable which looks like this:
host10=[{'hostid': '10084', 'proxy_hostid': '0'},
{'hostid': '10085', 'proxy_hostid': '1'}]
when I run
hosts = json.loads(host10)
print(hosts)
I am getting the error:
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not list
what do I wrong?
you can find your answer here
import json
json.dumps(list_name)

TypeError: Object of type bytes is not JSON serializable - python 3 - try to post base64 image data

i received this error after try convert data to json to post request
TypeError: Object of type 'bytes' is not JSON serializable
my code
dict_data: dict = {
'img': base64.b64encode(urlopen(obj['recognition_image_path']).read())
}
json_data: str = json.dumps(dict_data)
i read image from url, convert it to base64, after i received error when try convert data to json.
Please help
You need to convert to string first by calling .decode, since you can't JSON-serialize a bytes without knowing its encoding.
(base64.b64encode returns a bytes, not a string.)
import base64
from urllib.request import urlopen
import json
dict_data: dict = {
'img': base64.b64encode(urlopen(obj['recognition_image_path']).read()).decode('utf8')
}
json_data: str = json.dumps(dict_data)
edit: rewrite answer to address actual question, encode/decode
I will do it in a two step process:
First encode the image file into BASE64
Then decode the encoded file
And then transmit back the JSON data using the decoded file.
Here is an example:
Let's say the image file is is_image_file
Encode the image file by:
enc_image_file = base64.b64encode(is_image_file.read())
Next decode it by:
send_image_file = enc_image_file.decode()
Finally transmit the data using send_image_file as JsonResponse to wherever it would be used.
Of course, add import base64 before calling the function.
Note: Using json.dumps(dict_data) one gets a string which will not load the image/s.

Parse Twitter JSON Content in Ptython3

I searched for all similar questions and yet couldn't resolve below issue.
Here's my json file content:
https://objectsoftconsultants.s3.amazonaws.com/tweets.json
Code to get a particular element is as below:
import json
testsite_array = []
with open('tweets.json') as json_file:
testsite_array = json_file.readlines()
for text in testsite_array:
json_text = json.dumps(text)
resp = json.loads(json_text)
print(resp["created_at"])
Keep getting below error:
print(resp["created_at"])
TypeError: string indices must be integers
Thanks much for your time and help, well in advance.
I have to guess what you're trying to do and can only hope that this will help you:
with open('tweets.json') as f:
tweets = json.load(f)
print(tweets['created_at'])
It doesn't make sense to read a json file with readlines, because it is unlikely that each line of the file represents a complete json document.
Also I don't get why you're dumping the string only to load it again immediately.
Update:
Try this to parse your file line by line:
with open('tweets.json') as f:
lines = f.readlines()
for line in lines:
try:
tweet = json.loads(line)
print(tweet['created_at'])
except json.decoder.JSONDecodeError:
print('Error')
I want to point out however, that I do not recommend this approach. A file should contain only one json document. If the file does not contain a valid json document, the source for the file should be fixed.

Parse a JSON file with ISODate in Python

I have a JSON file with some lines like:
"updatedAt" : ISODate("2018-11-20T09:32:16.732+0000"),
I tried json.loads but it has an error json.decoder.JSONDecodeError: Expecting value: line 2 column 13 (char 15).
I believe that the problem is at ISODate () but how could I handle that with Python?
Many thanks
This is not valid JSON, to begin with. I guess the ISODATE("...") is generated from MongoDB, maybe dumping the ISODate() helper directly instead of its string representation into the JSON?
In any case, you could use a regex on the whole JSON-string to get rid of the ISODate("..."), retrieve the date as a string and then use python-dateutil to parse the value to a datetime.datetime.
Something to the tune of
import json
import dateutil.parse
import re
json_str = ....
clean_json = re.compile('ISODate\(("[^"]+")\)').sub('\\1', json_str)
json_obj = json.loads(clean_json)
# use dateutil.parser.parse(s) to parse each date into a datetime.datetime

Error parsing JSON file in python 3.4

I am trying to load a Json file from a url and parse it on Python3.4 but i get a few errors and I've no idea what they are pointing to. I did verify the json file on the url from jsonlint.com and the file seems fine. The data.read() is returning 'byte' file and i've type casted it. The code is
import urllib.request
import json
inp = input("enter url :")
if len(inp)<1: inp ='http://python-data.dr-chuck.net/comments_42.json'
data=urllib.request.urlopen(inp)
data_str = str(data.read())
print(type(data_str))
parse_data = json.loads(data_str)
print(type(parse_data))
The error that i'm getting is:
The expression str(data.read()) doesn't "cast" your bytes into a string, it just produces a string representation of them. This can be seen if you print data_str: it's a str beginning with b'.
To actually decode the JSON, you need to do data_str = data.read().decode('utf=8')