Unable to print output of JSON code into a .csv file - json

I'm getting the following errors when trying to decode this data, and the 2nd error after trying to compensate for the unicode error:
Error 1:
write.writerows(subjects)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 160: ordinal not in range(128)
Error 2:
with open("data.csv", encode="utf-8", "w",) as writeFile:
SyntaxError: non-keyword arg after keyword arg
Code
import requests
import json
import csv
from bs4 import BeautifulSoup
import urllib
r = urllib.urlopen('https://thisiscriminal.com/wp-json/criminal/v1/episodes?posts=10000&page=1')
data = json.loads(r.read().decode('utf-8'))
subjects = []
for post in data['posts']:
subjects.append([post['title'], post['episodeNumber'],
post['audioSource'], post['image']['large'], post['excerpt']['long']])
with open("data.csv", encode="utf-8", "w",) as writeFile:
write = csv.writer(writeFile)
write.writerows(subjects)

Using requests and with the correction to the second part (as below) I have no problem running. I think your first problem is due to the second error (is a consequence of that being incorrect).
I am on Python3 and can run yours with my fix to open line and with
r = urllib.request.urlopen('https://thisiscriminal.com/wp-json/criminal/v1/episodes?posts=10000&page=1')
I personally would use requests.
import requests
import csv
data = requests.get('https://thisiscriminal.com/wp-json/criminal/v1/episodes?posts=10000&page=1').json()
subjects = []
for post in data['posts']:
subjects.append([post['title'], post['episodeNumber'],
post['audioSource'], post['image']['large'], post['excerpt']['long']])
with open("data.csv", encoding ="utf-8", mode = "w",) as writeFile:
write = csv.writer(writeFile)
write.writerows(subjects)
For your second, looking at documentation for open function, you need to use the right argument names and add the name of the mode argument if not positional matching.
with open("data.csv", encoding ="utf-8", mode = "w") as writeFile:

Related

Error while running a python-Storing Data object in JSON

I've extracted data via api against which I had to transformation to read the data in tabular format. Sample code:
import json
import ast
import requests
from pandas import json_normalize
result = requests.get('https://website.com/api')
data = result.json()
df = pd.DataFrame(data['result']['records'])
Every time I run above python(.py) file in terminal, I get an error in line where it says;
in <module>
data = result.json()
Also this;
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Not sure why I am getting this error. Can anyone tell me how to fix this?
Any help would be appreciated.

line-delimited json format txt file, how to import with pandas

I have a line-delimited Json format txt file. The format of the file is .txt. Now I want to import it with pandas. Usually I can import with
df = pd.read_csv('df.txt')
df = pd.read_json('df.txt')
df = pd.read_fwf('df.txt')
they all give me an error.
ParserError: Error tokenizing data. C error: Expected 29 fields in line 1354, saw 34
ValueError: Trailing data
this returns the data, but the data is organized in a weird way where column name is in the left, next to the data
can anyone tells me how to solve this?
pd.read_json('df.txt', lines=True)
read_json accepts a boolean argument lines which will Read the file as a json object per line.

How to read a gzip jsonl from a byte offset to debug a BigQuery error?

I am exporting json newline files into BigQuery and the BigQuery errors give me a byte offset of the original gziped jsonl file such as
JSON parsing error in row starting at position 727720: Repeated field must be imported as a JSON array. Field: named_entities.alt_form."
I have tried using the Python package indexed gzip to read from the offset but indexed gzip mangles the lines sadly. I have also tried using the builtin python gzip package to try and get the relevant line unsuccessfully:
import gzip
import ujson as json
f = open('myfile.json.gz', 'rb')
g = gzip.GzipFile(fileobj=f)
fasz = g.read()
byte_offset_to_line = {}
for line in g:
byte_offset = f.tell()
byte_offset_to_line[byte_offset] = line
target = 727720
ls = sorted([(abs(target-k),k) for k in byte_offset_to_line.keys() if k < target])
line_of_interest = byte_offset_to_line[ls[0]]
text = str(line_of_interest)
malformed_json = json.loads(text[2:-3])
With the above snippet I can get the nearest line's byte offset. But then when I tried just uploading that line to a test table in BQ it works sadly so I think I am not getting the correct line.
I was wondering if there's a better approach to solve this problem? I am not sure why my above snippet doesn't work to be honest.

Serialise and deserialise pandas periodIndex series

The pandas Series.to_json() function is creating unreadable JSON when using a PeriodIndex.
The error that occurs is:
json.decoder.JSONDecodeError: Expecting ':' delimiter: line 1 column 5 (char 4)
I've tried changing the orient, but in all of these combinations of serialising and deserialising the index is lost.
idx = pd.PeriodIndex(['2019', '2020'], freq='A')
series = pd.Series([1, 2], index=idx)
json_series = series.to_json() # This is a demo - in reality I'm storing this in a database, but this code throws the same error
value = json.loads(json_series)
A link to the pandas to_json docs
A link to the python json lib docs
The reason I'm not using json.dumps is that the pandas series object is not serialisable.
Python 3.7.3 Pandas 0.24.2
A workaround is to convert PeriodIndex to regular Index before dump and convert it back to PeriodIndex after load:
regular_idx = period_idx.astype(str)
# then dump
# after load
period_idx = pd.to_datetime(regular_idx).to_period()

python 3 read csv UnicodeDecodeError

I have a very simple bit of code that takes in a CVS and puts it into a 2D array. It runs fine on Python2 but in Python3 I get the error below. Looking through the documentation,I think I need to use .decode() Could someone please explain how to use it in the context of my code and why I don't need to do anything in Python2
Error:
line 21, in
for row in datareader:
File "/usr/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa9 in position 5002: invalid start byte
import csv
import sys
fullTable = sys.argv[1]
datareader = csv.reader(open(fullTable, 'r'), delimiter=',')
full_table = []
for row in datareader:
full_table.append(row)
print(full_table)
open(argv[1], encoding='ISO-8859-1')
CSV contained characters where were not UTF-8 which seemed like the default. I am however surprised that python2 dealt with this issue without any problems.