I'm reading json arrays from a text file and then create an empty dataframe. I want to add a new column 'id' to the empty dataframe. 'id' comes from the json arrays in the text file.
Error message reads "Cannot set a frame with no defined index and a value that canot be converted to a series". I tried to overcome this error by defining dataframe size upfront which did not help. Any ideas?
import json
import pandas as pd
path = 'my/path'
mydata = []
myfile = open(path, "r")
for line in myfile:
try:
myline = json.loads(line)
mydata.append(myline)
except:
continue
mydf = pd.DataFrame()
mydf['id'] = map(lambda myline: myline['id'], mydata)
I think better is use:
for line in myfile:
try:
#extract only id to list
myline = json.loads(line)['id']
mydata.append(myline)
except:
continue
print (mydata)
[10, 5]
#create DataFrame by constructor
mydf = pd.DataFrame({'id':mydata})
print (mydf)
id
0 10
1 5
Related
I am struggling to convert a json file to a csv file. Any help would be appreciated. I am using Python3
Code
import json
import urllib.request
url = 'https://api.coingecko.com/api/v3/coins/bitcoin/market_chart?vs_currency=usd&days=1&interval=daily&sparkline=false'
req = urllib.request.Request(url)
##parsing response
myfile=open("coingecko1.csv","w",encoding="utf8")
headers="Prices,MrkCap,TolVol \n"
myfile.write(headers)
r = urllib.request.urlopen(req).read()
cont = json.loads(r.decode('utf-8'))
print (cont)#Just to check json result
for market in cont:
prices =(cont["prices"])
market_caps = (cont["market_caps"])
total_volumes = (cont["total_volumes"])
content= prices+","+str(market_caps)+","+str(total_volumes)+" \n"
myfile.write(content)
print("job complete")
Python Result
{'prices': [[1629331200000, 45015.46554608543], [1629361933000, 44618.52978218442]], 'market_caps': [[1629331200000, 847143004614.999], [1629361933000, 837151985590.3453]], 'total_volumes': [[1629331200000, 34668999387.83819], [1629361933000, 33367392889.386738]]}
Traceback (most recent call last):
File "ma1.py", line 22, in <module>
content= prices+","+str(market_caps)+","+str(total_volumes)+" \n"
TypeError: can only concatenate list (not "str") to list
CSV Result
CSV Result
Thank You
Your JSON is nested which is list of lists. To read easily in CSV you must flatten it out
I've reformatted the code to dump to CSV. check below
import csv
import json
import urllib.request
url = 'https://api.coingecko.com/api/v3/coins/bitcoin/market_chart?vs_currency=usd&days=1&interval=daily&sparkline=false'
req = urllib.request.Request(url)
r = urllib.request.urlopen(req).read()
cont = json.loads(r.decode('utf-8'))
# flatten the JSON data to read csv easily
flatten_data = {}
for key in cont:
for value in cont[key]:
if value[0] not in flatten_data:
flatten_data[value[0]] = {}
flatten_data[value[0]].update({key: value[1]})
# write csv with DictWriter
with open('coingecko1.csv', 'w', encoding='utf-8') as csvfile:
headers = ['Item', 'Prices', 'MrkCap', 'TolVol']
writer = csv.DictWriter(csvfile, fieldnames=headers)
writer.writeheader()
for k, v in flatten_data.items():
v.update({'Item': k})
# renamed the columns as required
v['Prices'] = v.pop('prices')
v['MrkCap'] = v.pop('market_caps')
v['TolVol'] = v.pop('total_volumes')
writer.writerow(v)
print("job complete")
I am using Pandas to load a json file and output it to Excel via the ExcelWriter. "NaN" is a valid value in the json and is getting stripped in the spreadsheet. How can I store the NaN value.
Here's the json input file (simple_json_test.json)
{"action_time":"2020-04-23T07:39:51.918Z","triggered_value":"NaN"}
{"action_time":"2020-04-23T07:39:51.918Z","triggered_value":"2"}
{"action_time":"2020-04-23T07:39:51.918Z","triggered_value":"1"}
{"action_time":"2020-04-23T07:39:51.918Z","triggered_value":"NaN"}
Here's the python code:
import pandas as pd
from datetime import datetime
with open('simple_json_test.json', 'r') as f:
data = f.readlines()
data = map(lambda x: x.rstrip(), data)
data_json_str = "[" + ','.join(data) + "]"
df = pd.read_json(data_json_str)
# Write dataframe to excel
df['action_time'] = df['action_time'].dt.tz_localize(None)
# Write the dataframe to excel
writer = pd.ExcelWriter('jsonNaNExcelTest.xlsx', engine='xlsxwriter',datetime_format='yyy-mm-dd hh:mm:ss.000')
df.to_excel(writer, header=True, sheet_name='Pandas_Test',index=False)
# Widen the columns
worksheet = writer.sheets['Pandas_Test']
worksheet.set_column('A:B', 25)
writer.save()
Here's the output excel file:
Once that basic question is answer, i want to be able to specify which columns "NaN' is a valid value so save it to excel.
The default action for to_excel() is to convert NaN to the empty string ''. See the Pandas docs for to_excel() and the na_rep parameter.
You can specify an alternative like this:
df.to_excel(writer, header=True, sheet_name='Pandas_Test',
index=False, na_rep='NaN')
This line:
else:
#add to this
nutrients_totals_df = pd.read_json(total_nutrients_json, orient='split')
is throwing the error.
I write my json like:
nutrients_json = nutrients_df.to_json(date_format='iso', orient='split')
Then I stash it in a hidden div or dcc.Storage in one callback and get it in another callback. How do I fix this error?
When I read json files that i've written with Pandas, I use the function below and call inside of json.loads().
def read_json_file_from_local(fullpath):
"""read json file from local"""
with open(fullpath, 'rb') as f:
data = f.read().decode('utf-8')
return data
df = json.loads(read_json_file_from_local(fullpath))
I am trying to convert csv files in a folder to a single json file. Below code does the job, but the issue is, json file has the first csv written several times. Below is the code i tried. I guess i am going wrong with assigning the data variable. Help me fix it
import csv, json, os
dir_path = 'C:/Users/USER/Desktop/output_files'
inputfiles = [file for file in os.listdir(dir_path) if file.endswith('.csv')]
outputfile = "data_backup1.json"
for file in inputfiles:
filepath = os.path.join(dir_path, file)
data = {}
with open(filepath, "r") as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
id = row['ID']
data[id] = row
with open(outputfile, "a") as jsonfile:
jsonfile.write(json.dumps(data, indent=4))
Expected output: Json file needs to have each csv written only once into it.
if your .csv files and all of the rows do have different ['ID']s, your assigned dictionary keys should be unique. In this case, your dictionary is growing with one entry per reader .csv row.
You have to change the indentation of the jsonfile.write() function as shown below to produce just one .json file. To sort your entries you could add sort_keys=True in this function.
for file in inputfiles:
filepath = os.path.join(dir_path, file)
data = {}
with open(filepath, "r") as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
id = row['ID']
data[id] = row
with open(outputfile, "a") as jsonfile:
jsonfile.write(json.dumps(data, indent=4, sort_keys=True))
The format in the file looks like this
{ 'match' : 'a', 'score' : '2'},{......}
I've tried pd.DataFrame and I've also tried reading it by line but it gives me everything in one cell
I'm new to python
Thanks in advance
Expected result is a pandas dataframe
Try use json_normalize() function
Example:
from pandas.io.json import json_normalize
values = [{'match': 'a', 'score': '2'}, {'match': 'b', 'score': '3'}, {'match': 'c', 'score': '4'}]
df = json_normalize(values)
print(df)
Output:
If one line of your file corresponds to one JSON object, you can do the following:
# import library for working with JSON and pandas
import json
import pandas as pd
# make an empty list
data = []
# open your file and add every row as a dict to the list with data
with open("/path/to/your/file", "r") as file:
for line in file:
data.append(json.loads(line))
# make a pandas data frame
df = pd.DataFrame(data)
If there is more than only one JSON object on one row of your file, then you should find those JSON objects, for example here are two possible options. The solution with the second option would look like this:
# import all you will need
import pandas as pd
import json
from json import JSONDecoder
# define function
def extract_json_objects(text, decoder=JSONDecoder()):
pos = 0
while True:
match = text.find('{', pos)
if match == -1:
break
try:
result, index = decoder.raw_decode(text[match:])
yield result
pos = match + index
except ValueError:
pos = match + 1
# make an empty list
data = []
# open your file and add every JSON object as a dict to the list with data
with open("/path/to/your/file", "r") as file:
for line in file:
for item in extract_json_objects(line):
data.append(item)
# make a pandas data frame
df = pd.DataFrame(data)