Convert dataframe to JSON using Python - json

I have been trying to convert a dataframe to JSON using Python. I am able to do it successfully but i am not getting the required format of JSON.
Code -
df1 = df.rename_axis('CUST_ID').reset_index()
df.to_json('abc.json')
Here, abc.json is the filename of JSON and df is the required dataframe.
What I am getting -
{"CUST_LAST_UPDATED":
{"1000":1556879045879.0,"1001":1556879052416.0},
"CUST_NAME":{"1000":"newly
updated_3_file","1001":"heeloo1"}}
What I want -
[{"CUST_ID":1000,"CUST_NAME":"newly
updated_3_file","CUST_LAST_UPDATED":1556879045879},
{"CUST_ID":1001,"CUST_NAME":"heeloo1","CUST_LAST_UPDATED":1556879052416}]
Error -
Traceback (most recent call last):
File
"C:/Users/T/PycharmProject/test_pandas.py",
line 19, in <module>
df1 = df.rename_axis('CUST_ID').reset_index()
File "C:\Users\T\AppData\Local\Programs\Python\Python36\lib\site-
packages\pandas\core\frame.py", line 3379, in reset_index
new_obj.insert(0, name, level_values)
File "C:\Users\T\AppData\Local\Programs\Python\Python36\lib\site-
packages\pandas\core\frame.py", line 2613, in insert
allow_duplicates=allow_duplicates)
File "C:\Users\T\AppData\Local\Programs\Python\Python36\lib\site-
packages\pandas\core\internals.py", line 4063, in insert
raise ValueError('cannot insert {}, already exists'.format(item))
ValueError: cannot insert CUST_ID, already exists
df.head() Output -
CUST_ID CUST_LAST_UPDATED CUST_NAME
0 1000 1556879045879 newly updated_3_file
1 1001 1556879052416 heeloo1
How to change the format while converting dataframe to JSON?

Use DataFrame.rename_axis with DataFrame.reset_index for column from index and then DataFrame.to_json with orient='records':
df1 = df.rename_axis('CUST_ID').reset_index()
df1.to_json('abc.json', orient='records')
[{"CUST_ID":"1000",
"CUST_LAST_UPDATED":1556879045879.0,
"CUST_NAME":"newly updated_3_file"},
{"CUST_ID":"1001",
"CUST_LAST_UPDATED":1556879052416.0,
"CUST_NAME":"heeloo1"}]
EDIT:
Because there is default index in data, use:
df1.to_json('abc.json', orient='records')
Verify:
print (df1.to_json(orient='records'))
[{"CUST_ID":1000,
"CUST_LAST_UPDATED":1556879045879,
"CUST_NAME":"newly pdated_3_file"},
{"CUST_ID":1001,
"CUST_LAST_UPDATED":1556879052416,
"CUST_NAME":"heeloo1"}]

You can convert a dataframe to a jason format using to_dict:
df1.to_dict('records')
the outpit would the one that you need.

Suppose if dataframe has nan values in each row and you don't want them in your json file. Follow below code
import pandas as pd
from pprint import pprint
import json
import argparse
if __name__=="__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--csv")
parser.add_argument("--json")
args = parser.parse_args()
entities=pd.read_csv(args.csv)
json_data=[row.dropna().to_dict() for index,row in entities.iterrows()]
with open(args.json,"w") as file:
json.dump(json_data,file)

Related

Pandas ExcelWriter removing valid NaN values when they are needed in output spreadsheet

I am using Pandas to load a json file and output it to Excel via the ExcelWriter. "NaN" is a valid value in the json and is getting stripped in the spreadsheet. How can I store the NaN value.
Here's the json input file (simple_json_test.json)
{"action_time":"2020-04-23T07:39:51.918Z","triggered_value":"NaN"}
{"action_time":"2020-04-23T07:39:51.918Z","triggered_value":"2"}
{"action_time":"2020-04-23T07:39:51.918Z","triggered_value":"1"}
{"action_time":"2020-04-23T07:39:51.918Z","triggered_value":"NaN"}
Here's the python code:
import pandas as pd
from datetime import datetime
with open('simple_json_test.json', 'r') as f:
data = f.readlines()
data = map(lambda x: x.rstrip(), data)
data_json_str = "[" + ','.join(data) + "]"
df = pd.read_json(data_json_str)
# Write dataframe to excel
df['action_time'] = df['action_time'].dt.tz_localize(None)
# Write the dataframe to excel
writer = pd.ExcelWriter('jsonNaNExcelTest.xlsx', engine='xlsxwriter',datetime_format='yyy-mm-dd hh:mm:ss.000')
df.to_excel(writer, header=True, sheet_name='Pandas_Test',index=False)
# Widen the columns
worksheet = writer.sheets['Pandas_Test']
worksheet.set_column('A:B', 25)
writer.save()
Here's the output excel file:
Once that basic question is answer, i want to be able to specify which columns "NaN' is a valid value so save it to excel.
The default action for to_excel() is to convert NaN to the empty string ''. See the Pandas docs for to_excel() and the na_rep parameter.
You can specify an alternative like this:
df.to_excel(writer, header=True, sheet_name='Pandas_Test',
index=False, na_rep='NaN')

line-delimited json format txt file, how to import with pandas

I have a line-delimited Json format txt file. The format of the file is .txt. Now I want to import it with pandas. Usually I can import with
df = pd.read_csv('df.txt')
df = pd.read_json('df.txt')
df = pd.read_fwf('df.txt')
they all give me an error.
ParserError: Error tokenizing data. C error: Expected 29 fields in line 1354, saw 34
ValueError: Trailing data
this returns the data, but the data is organized in a weird way where column name is in the left, next to the data
can anyone tells me how to solve this?
pd.read_json('df.txt', lines=True)
read_json accepts a boolean argument lines which will Read the file as a json object per line.

How to convert this json file to pandas dataframe

The format in the file looks like this
{ 'match' : 'a', 'score' : '2'},{......}
I've tried pd.DataFrame and I've also tried reading it by line but it gives me everything in one cell
I'm new to python
Thanks in advance
Expected result is a pandas dataframe
Try use json_normalize() function
Example:
from pandas.io.json import json_normalize
values = [{'match': 'a', 'score': '2'}, {'match': 'b', 'score': '3'}, {'match': 'c', 'score': '4'}]
df = json_normalize(values)
print(df)
Output:
If one line of your file corresponds to one JSON object, you can do the following:
# import library for working with JSON and pandas
import json
import pandas as pd
# make an empty list
data = []
# open your file and add every row as a dict to the list with data
with open("/path/to/your/file", "r") as file:
for line in file:
data.append(json.loads(line))
# make a pandas data frame
df = pd.DataFrame(data)
If there is more than only one JSON object on one row of your file, then you should find those JSON objects, for example here are two possible options. The solution with the second option would look like this:
# import all you will need
import pandas as pd
import json
from json import JSONDecoder
# define function
def extract_json_objects(text, decoder=JSONDecoder()):
pos = 0
while True:
match = text.find('{', pos)
if match == -1:
break
try:
result, index = decoder.raw_decode(text[match:])
yield result
pos = match + index
except ValueError:
pos = match + 1
# make an empty list
data = []
# open your file and add every JSON object as a dict to the list with data
with open("/path/to/your/file", "r") as file:
for line in file:
for item in extract_json_objects(line):
data.append(item)
# make a pandas data frame
df = pd.DataFrame(data)

Export JSON to CSV using Python

I wrote a code to extract some information from a website. the output is in JSON and I want to export it to CSV. So, I tried to convert it to a pandas dataframe and then export it to CSV in pandas. I can print the results but still, it doesn't convert the file to a pandas dataframe. Do you know what the problem with my code is?
# -*- coding: utf-8 -*-
# To create http request/session
import requests
import re, urllib
import pandas as pd
from BeautifulSoup import BeautifulSoup
url = "https://www.indeed.com/jobs?
q=construction%20manager&l=Houston&start=10"
# create session
s = requests.session()
html = s.get(url).text
# exctract job IDs
job_ids = ','.join(re.findall(r"jobKeysWithInfo\['(.+?)'\]", html))
ajax_url = 'https://www.indeed.com/rpc/jobdescs?jks=' +
urllib.quote(job_ids)
# do Ajax request and convert the response to json
ajax_content = s.get(ajax_url).json()
print(ajax_content)
#Convert to pandas dataframe
df = pd.read_json(ajax_content)
#Export to CSV
df.to_csv("c:\\users\\Name\desktop\\newcsv.csv")
The error message is:
Traceback (most recent call last):
File "C:\Users\Mehrdad\Desktop\Indeed 06.py", line 21, in
df = pd.read_json(ajax_content)
File "c:\python27\lib\site-packages\pandas\io\json\json.py", line 408, in read_json
path_or_buf, encoding=encoding, compression=compression,
File "c:\python27\lib\site-packages\pandas\io\common.py", line 218, in get_filepath_or_buffer
raise ValueError(msg.format(_type=type(filepath_or_buffer)))
ValueError: Invalid file path or buffer object type:
The problem was that nothing was going into the dataframe when you called read_json() because it was a nested JSON dict:
import requests
import re, urllib
import pandas as pd
from pandas.io.json import json_normalize
url = "https://www.indeed.com/jobs?q=construction%20manager&l=Houston&start=10"
s = requests.session()
html = s.get(url).text
job_ids = ','.join(re.findall(r"jobKeysWithInfo\['(.+?)'\]", html))
ajax_url = 'https://www.indeed.com/rpc/jobdescs?jks=' + urllib.quote(job_ids)
ajax_content= s.get(ajax_url).json()
df = json_normalize(ajax_content).transpose()
df.to_csv('your_output_file.csv')
Note that I called json_normalize() to collapse the nested columns from the JSON. I also called transpose() so that the rows were labelled with the job ID rather than columns. This will give you a dataframe that looks like this:
0079ccae458b4dcf <p><b>Company Environment: </b></p><p>Planet F...
0c1ab61fe31a5c62 <p><b>Commercial Construction Project Manager<...
0feac44386ddcf99 <div><div>Trendmaker Homes is currently seekin...
...
It's not really clear what your expected output is, though ... what are you expecting the DataFrame/CSV file to look like?. If you actually were looking for just a single row/Series with the job ID's as column labels, just remove the call to transpose()

Convert Pandas DataFrame to JSON format

I have a Pandas DataFrame with two columns – one with the filename and one with the hour in which it was generated:
File Hour
F1 1
F1 2
F2 1
F3 1
I am trying to convert it to a JSON file with the following format:
{"File":"F1","Hour":"1"}
{"File":"F1","Hour":"2"}
{"File":"F2","Hour":"1"}
{"File":"F3","Hour":"1"}
When I use the command DataFrame.to_json(orient = "records"), I get the records in the below format:
[{"File":"F1","Hour":"1"},
{"File":"F1","Hour":"2"},
{"File":"F2","Hour":"1"},
{"File":"F3","Hour":"1"}]
I'm just wondering whether there is an option to get the JSON file in the desired format. Any help would be appreciated.
The output that you get after DF.to_json is a string. So, you can simply slice it according to your requirement and remove the commas from it too.
out = df.to_json(orient='records')[1:-1].replace('},{', '} {')
To write the output to a text file, you could do:
with open('file_name.txt', 'w') as f:
f.write(out)
In newer versions of pandas (0.20.0+, I believe), this can be done directly:
df.to_json('temp.json', orient='records', lines=True)
Direct compression is also possible:
df.to_json('temp.json.gz', orient='records', lines=True, compression='gzip')
I think what the OP is looking for is:
with open('temp.json', 'w') as f:
f.write(df.to_json(orient='records', lines=True))
This should do the trick.
use this formula to convert a pandas DataFrame to a list of dictionaries :
import json
json_list = json.loads(json.dumps(list(DataFrame.T.to_dict().values())))
Try this one:
json.dumps(json.loads(df.to_json(orient="records")))
convert data-frame to list of dictionary
list_dict = []
for index, row in list(df.iterrows()):
list_dict.append(dict(row))
save file
with open("output.json", mode) as f:
f.write("\n".join(str(item) for item in list_dict))
To transform a dataFrame in a real json (not a string) I use:
from io import StringIO
import json
import DataFrame
buff=StringIO()
#df is your DataFrame
df.to_json(path_or_buf=buff,orient='records')
dfJson=json.loads(buff)
instead of using dataframe.to_json(orient = “records”)
use dataframe.to_json(orient = “index”)
my above code convert the dataframe into json format of dict like {index -> {column -> value}}
Here is small utility class that converts JSON to DataFrame and back: Hope you find this helpful.
# -*- coding: utf-8 -*-
from pandas.io.json import json_normalize
class DFConverter:
#Converts the input JSON to a DataFrame
def convertToDF(self,dfJSON):
return(json_normalize(dfJSON))
#Converts the input DataFrame to JSON
def convertToJSON(self, df):
resultJSON = df.to_json(orient='records')
return(resultJSON)