Converting JSON files to .csv - json

I've found some data that someone is downloading into a JSON file (I think! - I'm a newb!). The file contains data on nearly 600 football players.
Here you can find the file
In the past, I have downloaded the json file and then used this code:
import csv
import json
json_data = open("file.json")
data = json.load(json_data)
f = csv.writer(open("fix_hists.csv","wb+"))
arr = []
for i in data:
fh = data[i]["fixture_history"]
array = fh["all"]
for j in array:
try:
j.insert(0,str(data[i]["first_name"]))
except:
j.insert(0,'error')
try:
j.insert(1,data[i]["web_name"])
except:
j.insert(1,'error')
try:
f.writerow(j)
except:
f.writerow(['error','error'])
json_data.close()
Sadly, when I do this now in command prompt, i get the following error:
Traceback (most recent call last):
File"fix_hist.py", line 12 (module)
fh = data[i]["fixture_history"]
TypeError: list indices must be integers, not str
Can this be fixed or is there another way I can grab some of the data and convert it to .csv? Specifically the 'Fixture History'? and then 'First'Name', 'type_name' etc.
Thanks in advance for any help :)

Try this tool: http://www.convertcsv.com/json-to-csv.htm
You will need to configure a few things, but should be easy enough.

Related

Error trying to open json file [json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes]

I'm trying to open a json file using the json library in Python 3.8 but I have not succeeded.
This is my MWE:
with open(pbit_path + file_name, 'r') as f:
data = json.load(f)
print(data)
where pbit_path and file_name is the absolute path of the .json file. As an example, this is a sample of the .json file that i'm trying to open.
https://github.com/pwnaoj/desktop-tutorial/blob/master/DataModelSchema.json
Error returned
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
I have also tried using the functions loads(), dump(), dumps().
I appreciate any suggestions
Thanks in advance.
I found a solution to my problem. In principle, it is an encoding problem since the type of file I am trying to read is encoded with UCS-2, so in python
with open(file, mode='r', encoding='utf_16_le') as file:
data = file.read()
data = json.loads(data)
file.close()

Parsing bulk conversion from JSON to CSV

I am using the following code in Python 3 to convert ~30,000 json files to a csv.
with open('out.csv', 'w') as f:
for fname in glob("*.json"): # Reads all json from the current directory
with open(fname) as j:
f.write(str(json.load(j)))
f.write('\n')
The json files are timestamps and values, for example {"1501005600":956170,"1501048800":970046,...
The output currently is
.
How can I put each in their own respective cells so the output is ?
I have tried many approaches with csv.writer but I cannot figure this out.
UPDATE
with open('out.csv', 'w') as f:
for fname in glob("*.json"):
with open(fname) as j:
values = json.load(j)
for k, v in values.items():
f.write("{},{},".format(str(k), str(v)))
Parsing is correct but each .json file is on one row now.
A friend helped me get to the bottom of this, hope this may help others.
with open('[insert].csv', 'w') as f:
for fname in glob("*.json"):
with open(fname) as j:
values = json.load(j)
for k, v in values.items():
f.write("{},{},".format(str(k), str(v)))
f.write('\n')

Python reading records from a json file and writing to two separate json files

I have a twitter json file and I'm and trying to separate the English and French tweets into two separate files. I'm using Python 2.7 with the following code:
import json
with open('tweets.json', 'r') as f:
with open('english.json', 'w') as enF:
with open('french.json', 'w') as frF:
for line in f:
tweet = json.loads(line)
if tweet["lang"] == "en":
json.dump(tweet, enF, sort_keys=True)
elif tweet["lang"] == "fr":
json.dump(tweet, frF, sort_keys=True)
This produces the two separate json files, one having English tweets and the other French, which I have checked. The original file has one tweet per line. The english.json and the french.json files have just a single line of all the tweets. Not sure if that will be a problem, not even confident that this is correct. So I passed the english.json again through this code (obviously I changed the name of the file) and it gives an error:
Traceback (most recent call last):
File "C:\Users\jack\Desktop\twitClean\j4.py", line 10, in <module>
tweet = json.loads(line)
File "C:\Python27\lib\json\__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "C:\Python27\lib\json\decoder.py", line 367, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 4926 - line 1 column 691991 (char 4925 - 691990)
I've been working on this for the past three days, and have come up with nothing. Can anyone please help and tell me what I'm doing wrong?
What about loading the json file as such
with open('tweets.json', 'r') as f:
tweets_dict = json.load(f)
Then, given that the python-native representation of a json is a dictionnary, you can iter over it and build your french and english related dictionnaries as well. I mean, doing
fr_dict, en_dict, ot_dict = {}, {}, {}
for id_,tweet in tweets_dict.items():
if tweet['lang'] == 'fr':
fr_dict[id_] = tweet
elif tweet['lang'] == 'en':
en_dict[id_] = tweet
else:
ot_dict[id_] = tweet
with open('french.json', 'w') as frF:
json.dump(fr_dict, frF, sort_keys=True)
with open('english.json', 'w') as enF:
json.dump(en_dict, enF, sort_keys=True)
with open('other.json', 'w') as otF:
json.dump(ot_dict, otF, sort_keys=True)
SOLVED: Unfortunately, being only a python hacker I cannot solve this using python. I'm sure there must be a way using python. So if someone else needs such a solution here it is.The solution I found was using jq as follows:
cat jsonfile | jq '. | select(.lang=="en")' > savefile
Obviously using this code the jsonfile has to be read twice as I need the English and French tweets in separate files.

can you convert a dict() to a sequence?

I have a dict() for all the nmea sentences that are found in a csv. I tried creating another csv to write the results of the dict() into it for statistical and logging purposes. However, I can't due to the dict() not being 'callable'?
import csv
#Counts the number of times a GPS command is observed
def list_gps_commands(data):
"""Counts the number of times a GPS command is observed.
Returns a dictionary object."""
gps_cmds = dict()
for row in data:
try:
gps_cmds[row[0]] += 1
except KeyError:
gps_cmds[row[0]] = 1
return gps_cmds
print(list_gps_commands(read_it))
print ("- - - - - - - - - - - - -")
with open('gpsresults.csv', 'w') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',', dialect='excel')
spamwriter.writerow(list_gps_commands(read_it))
Can someone help me? Is there a way I can convert the keys/values into sequences so the csv module can recognize it? Or another way?
Use csv.DictWriter instead of csv.writer.

Python 3 Pandas Error: pandas.parser.CParserError: Error tokenizing data. C error: Expected 11 fields in line 5, saw 13

I checked out this answer as I am having a similar problem.
Python Pandas Error tokenizing data
However, for some reason ALL of my rows are being skipped.
My code is simple:
import pandas as pd
fname = "data.csv"
input_data = pd.read_csv(fname)
and the error I get is:
File "preprocessing.py", line 8, in <module>
input_data = pd.read_csv(fname) #raw data file ---> pandas.core.frame.DataFrame type
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/io/parsers.py", line 465, in parser_f
return _read(filepath_or_buffer, kwds)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/io/parsers.py", line 251, in _read
return parser.read()
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/io/parsers.py", line 710, in read
ret = self._engine.read(nrows)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/io/parsers.py", line 1154, in read
data = self._reader.read(nrows)
File "pandas/parser.pyx", line 754, in pandas.parser.TextReader.read (pandas/parser.c:7391)
File "pandas/parser.pyx", line 776, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:7631)
File "pandas/parser.pyx", line 829, in pandas.parser.TextReader._read_rows (pandas/parser.c:8253)
File "pandas/parser.pyx", line 816, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:8127)
File "pandas/parser.pyx", line 1728, in pandas.parser.raise_parser_error (pandas/parser.c:20357)
pandas.parser.CParserError: Error tokenizing data. C error: Expected 11 fields in line 5, saw 13
Solution is to use pandas built-in delimiter "sniffing".
input_data = pd.read_csv(fname, sep=None)
For those landing here, I got this error when the file was actually an .xls file not a true .csv. Try resaving as a csv in a spreadsheet app.
I had the same error, I read my csv data using this :
d1 = pd.read_json('my.csv')
then I try this
d1 = pd.read_json('my.csv', sep='\t')
and this time it's right.
So you could try this method if your delimiter is not ',', because the default is ',', so if you don't indicate clearly, it go wrong.
pandas.read_csv
This error means, you get unequal number of columns for each row. In your case, until row 5, you've had 11 columns but in line 5 you have 13 inputs (columns).
For this problem, you can try the following approach to open read your file:
import csv
with open('filename.csv', 'r') as file:
reader = csv.reader(file, delimiter=',') #if you have a csv file use comma delimiter
for row in reader:
print (row)
This parsing error could occur for multiple reasons and solutions to the different reasons have been posted here as well as in Python Pandas Error tokenizing data.
I posted a solution to one possible reason for this error here: https://stackoverflow.com/a/43145539/6466550
I have had similar problems. With my csv files it occurs because they were created in R, so it has some extra commas and different spacing than a "regular" csv file.
I found that if I did a read.table in R, I could then save it using write.csv and the option of row.names = F.
I could not get any of the read options in pandas to help me.
The problem could be that one or multiple rows of csv file contain more delimiters (commas ,) than expected. It is solved when each row matches the amount of delimiters of the first line of the csv file where the column names are defined.
use \t+ in the separator pattern instead of \t.
import pandas as pd
fname = "data.csv"
input_data = pd.read_csv(fname, sep='\t+`, header=None)