Pandas read_json - Stop convert index to datetime - json

I have the below data in a a.json file.
{
"1000000000": {
"TEST": 2
}
}
import pandas as pd
df = pd.read_json(r"a.json", dtype= str, orient='index', convert_dates=False)
print(df)
Getting output as :
TEST
2001-09-09 01:46:40 2
Expected :
TEST
1000000000 2

You need parameter convert_axes=False in read_json:
You can use it like this
import pandas as pd
df = pd.read_json("a.json", dtype= str, orient='index',convert_axes=False ,convert_dates=False)
print(df)

Related

JSON list items to dataframe

I am using an API that has changed its spec and now my JSON feed is slightly different.
BEFORE:
{"code":2000,"message":"SUCCESS","data":
{"1":
{"id":1,
"name":"Amanda",
"score":"57.36%",
"average":"53.47%"
}
}
}
Then, I used something to this effect:
import json
import pandas as pd
jsonfile = 'file.json'
with open(jsonfile) as j:
data = json.load(j)
rows = [v for k, v in data["data"].items()]
df = pd.DataFrame(rows, columns=['id', 'name', 'score', 'average'])
Source AFTER:
{"status":"success","code":0,"data":
{"data":
[
{
"id":1,
"name":"Robert",
"score":"48.85%",
"average":"40.52%"
}
]
}
}
So I'm attempting to adjust using some of the resources:
Convert JSON list to pandas dataframe
JSON to pandas DataFrame
how to convert json data with list of list to dataframe using python pandas
What I've tried so far:
import json
import pandas as pd
from pandas import json_normalize
jsonfile = 'file.json'
with open(jsonfile) as j:
data = json.load(j)
df = json_normalize(data, ['data'])
I've also tried:
df = pd.DataFrame.from_records(data)
I get the following:
TypeError: {....} for path data. Must be list or null.
What am I missing here?

python how to convert json contains multiple arrays to pandas dataframe

hey im having trouble converting json to dataframe using pandas here is my solution
import json
import pandas as pd
f = open('write.json')
data = json.load(f)
df = pd.DataFrame.from_dict(data,orient = 'index').reset_index()
print(df)
and here is the json file
{"_id":"60b53d92ccb1483964da45f9","Avg_sm":[26.66953125,26.66953125,26.666666666666668,26.666666666666668,26.666666666666668,26.666666666666668,26.666666666666668,26.666666666666668,26.6647859922179,26.6647859922179,26.45263157894737,26.45263157894737],"Avg_st":[22.6517578125,22.6517578125,22.65204678362573,22.65204678362573,22.65204678362573,22.65204678362573,22.65204678362573,22.65204678362573,22.65272373540856,22.65272373540856,22.694567062818336,22.694567062818336],"SensorCoordinates":[10.33363276545083,36.8434191667489],"SensorIdentifier":["CCCCCCCCCCCCCCCC","CCCCCCCCCCCCCCCC","CCCCCCCCCCCCCCCC","CCCCCCCCCCCCCCCC","CCCCCCCCCCCCCCCC","CCCCCCCCCCCCCCCC","CCCCCCCCCCCCCCCC","CCCCCCCCCCCCCCCC","CCCCCCCCCCCCCCCC","CCCCCCCCCCCCCCCC","CCCCCCCCCCCCCCCC","CCCCCCCCCCCCCCCC","CCCCCCCCCCCCCCCC","CCCCCCCCCCCCCCCC","CCCCCCCCCCCCCCCC","CCCCCCCCCCCCCCCC","CCCCCCCCCCCCCCCC","CCCCCCCCCCCCCCCC","CCCCCCCCCCCCCCCC","CCCCCCCCCCCCCCCC","CCCCCCCCCCCCCCCC","CCCCCCCCCCCCCCCC","CCCCCCCCCCCCCCCC","CCCCCCCCCCCCCCCC"],"count":24,"date":["25-06-2021","25-06-2021","25-06-2021","25-06-2021","25-06-2021","25-06-2021","25-06-2021","25-06-2021","25-06-2021","25-06-2021","25-06-2021","25-06-2021","25-06-2021","25-06-2021","25-06-2021","25-06-2021","25-06-2021","25-06-2021","25-06-2021","25-06-2021","26-06-2021","26-06-2021","26-06-2021","26-06-2021"],"min_sm":[21.1,21.1,21.1,21.1,21.1,21.1,21.1,21.1,21.1,21.1,21.1,21.1],"sensorId":["60b54789a21c170aecb25285","60b54789a21c170aecb25285","60b54789a21c170aecb25285","60b54789a21c170aecb25285","60b54789a21c170aecb25285","60b54789a21c170aecb25285","60b54789a21c170aecb25285","60b54789a21c170aecb25285","60b54789a21c170aecb25285","60b54789a21c170aecb25285","60b54789a21c170aecb25285","60b54789a21c170aecb25285","60b54789a21c170aecb25285","60b54789a21c170aecb25285","60b54789a21c170aecb25285","60b54789a21c170aecb25285","60b54789a21c170aecb25285","60b54789a21c170aecb25285","60b54789a21c170aecb25285","60b54789a21c170aecb25285","60b54789a21c170aecb25285","60b54789a21c170aecb25285","60b54789a21c170aecb25285","60b54789a21c170aecb25285"],"status":[true,true,true,true,true,true,true,true,true,true,true,true,true,true,true,true,true,true,true,true,true,true,true,true]}
IIUC:
you can try:
df=pd.json_normalize(data).apply(pd.Series.explode,ignore_index=True)
OR
df = pd.DataFrame.from_dict(data,orient = 'index').T.apply(pd.Series.explode,ignore_index=True)

Pandas dataframe extracting value from json, which returned from as content(JSON) from request,

Pandas dataframe extracting value from JSON, which returned from as content from request.
import pandas as pd
import pandas as pd
import json
import requests
import ast
from pandas.io.json import json_normalize
df['response'] = df.URL.apply(lambda u: requests.get(u).content)
df.head()
b'{"error":0,"short":"http:\\/\\/192.168.42.72\\/ECyKY"}'
b'{"error":0,"short":"http:\\/\\/192.168.42.72\\/IsMgE"}'
When we use Python without Pandas, we can just use:
resp = requests.get(u)
y=resp.json()
print(y)
print(y['short'])
to store the short value as "http://192.168.42.72/ECyKY"
spend hours trying to get it work with Pandas without luck, any hint?
Instead of using response.get.content directly use response.get.json then use Series.str.get to extract the value corresponding to key short from the dictionary and then assign it to new column short:
df['response'] = df['URL'].apply(lambda u: requests.get(u).json())
df['short'] = df['response'].str.get('short')
# print(df)
response short
0 {'error': 0, 'short': 'http://192.168.42.72/EC... http://192.168.42.72/ECyKY
1 {'error': 0, 'short': 'http://192.168.42.72/Is... http://192.168.42.72/IsMgE

How to convert this json file to pandas dataframe

The format in the file looks like this
{ 'match' : 'a', 'score' : '2'},{......}
I've tried pd.DataFrame and I've also tried reading it by line but it gives me everything in one cell
I'm new to python
Thanks in advance
Expected result is a pandas dataframe
Try use json_normalize() function
Example:
from pandas.io.json import json_normalize
values = [{'match': 'a', 'score': '2'}, {'match': 'b', 'score': '3'}, {'match': 'c', 'score': '4'}]
df = json_normalize(values)
print(df)
Output:
If one line of your file corresponds to one JSON object, you can do the following:
# import library for working with JSON and pandas
import json
import pandas as pd
# make an empty list
data = []
# open your file and add every row as a dict to the list with data
with open("/path/to/your/file", "r") as file:
for line in file:
data.append(json.loads(line))
# make a pandas data frame
df = pd.DataFrame(data)
If there is more than only one JSON object on one row of your file, then you should find those JSON objects, for example here are two possible options. The solution with the second option would look like this:
# import all you will need
import pandas as pd
import json
from json import JSONDecoder
# define function
def extract_json_objects(text, decoder=JSONDecoder()):
pos = 0
while True:
match = text.find('{', pos)
if match == -1:
break
try:
result, index = decoder.raw_decode(text[match:])
yield result
pos = match + index
except ValueError:
pos = match + 1
# make an empty list
data = []
# open your file and add every JSON object as a dict to the list with data
with open("/path/to/your/file", "r") as file:
for line in file:
for item in extract_json_objects(line):
data.append(item)
# make a pandas data frame
df = pd.DataFrame(data)

Convert patentsview API data returned as nested JSON into a pandas dataframe

I am trying to convert a JSON derived from the patentsview API into a pandas dataframe. However it is difficult because it seems to be a nested JSON!
Here is my code:
import requests
import pandas as pd
from pandas.io.json import json_normalize
import json
url = 'http://www.patentsview.org/api/patents/query?q={"cpc_group_id":"B60W"}&f=["inventor_first_name","inventor_last_name","patent_number", "assignee_country"]'
r = requests.get(url)
json_data = r.json()
df = pd.DataFrame(json_data['patents'])
df
See the image for the dataframe that is returned.
My question is how can I get the nested dictionary keys and their values into unique columns?
result = pd.DataFrame.from_records(df.assignees.apply(lambda x: x[0]))
inventors = pd.DataFrame.from_records(df.inventors.apply(lambda x: x[0]))
result = pd.concat([result, inventors, df.patent_number], axis=1)
result.head()
assignee_country assignee_key_id inventor_first_name inventor_key_id \
0 None None Robert Cecil 4848
1 GB 82078 Anthony John 16057
2 None None James W. 16376
3 FR 281289 Gilles 18482
4 JP 301319 Kiyoharu 18507
inventor_last_name patent_number
0 Clerk 3932991
1 Adey 3939738
2 Moberg 3939937
3 Leconte 3941203
4 Murakami 3941223