I have a slightly complicated json that I need to convert into a dataframe. This is a standard output json from another API and hence the field names will not change.
I have the below dict which is more complicated than what I have worked with till now
>>> import pandas as pd
>>> data = [{'annotation_spec': {'description': 'Story_Driven',
... 'display_name': 'Story_Driven'},
... 'segments': [{'confidence': 0.52302074,
... 'segment': {'end_time_offset': {'nanos': 973306000, 'seconds': 14},
... 'start_time_offset': {}}}]},
... {'annotation_spec': {'description': 'real', 'display_name': 'real'},
... 'segments': [{'confidence': 0.5244379,
... 'segment': {'end_time_offset': {'nanos': 973306000, 'seconds': 14},
... 'start_time_offset': {}}}]}]
I looked through all related SO posts and the closest I can get this into a dataframe is this
from pandas.io.json import json_normalize
pd.DataFrame.from_dict(json_normalize(data,record_path=
['segments'],meta=[['annotation_spec','description'],
['annotation_spec','display_name']],errors='ignore'))
This gives me an output like this
>>> from pandas.io.json import json_normalize
>>> pd.DataFrame.from_dict(json_normalize(data,record_path=['segments'],meta=[['annotation_spec','description'],['annotation_spec','display_name']],errors='ignore'))
confidence segment annotation_spec.description annotation_spec.display_name
0 0.523021 {u'end_time_offset': {u'nanos': 973306000, u's... Story_Driven Story_Driven
1 0.524438 {u'end_time_offset': {u'nanos': 973306000, u's... real real
>>>
I want to break down the "segment"column above as well into its components. How can I do that?
Basically json_normalize takes care of nested dicts, here we have a problem because of the list in the segements key.
So if the length of the list will always be 1, we can just remove the list and then apply json_normalize
### function to remove the lsit, we basically check if its a list, if so just take the first element
remove_list = lambda dct:{k:(v[0] if type(v)==list else v) for k,v in dct.items()}
data_clean = [remove_list(entry) for entry in data]
json_normalize(data_clean, sep="__")
Related
I was wondering if there is a way to remove/replace null/empty square brackets in json or pandas dataframe. I have tried to replace them after converting into string via .astype(str) and it is successful and/but it seems it converts all json values into string and I can not process further with the same structure. I would appreciate any solution/recommendation. thanks...
With the following toy dataframe:
import pandas as pd
df = pd.DataFrame({"col1": ["a", [1, 2, 3], [], "d"], "col2": ["e", [], "f", "g"]})
print(df)
# Output
Here is one way to do it:
df = df.applymap(lambda x: pd.NA if isinstance(x, list) and not x else x)
print(df)
# Output
I'm trying to convert bytes data into JSON data. I got errors in converting data.
a = b'orderId=570d3e38-d6486056e&orderAmount=10.00&referenceId=34344&txStatus=SUCCESS&txMsg=Transaction+Successful&txTime=2021-06-26+12%3A03%3A12&signature=njtH5Dzmg6RJ1KB'
I used this to convert
json.loads(a.decode('utf-8'))
and I want to get a response like
orderAmount = 10.00
orderId = 570d3e38-d6486056e
txStatus = SUCCESS
What you here see is a query string [wiki], you can read the querystring with:
from django.http import QueryDict
a = b'orderId=570d3e38-d6486056e&orderAmount=10.00&referenceId=34344&txStatus=SUCCESS&txMsg=Transaction+Successful&txTime=2021-06-26+12%3A03%3A12&signature=njtH5Dzmg6RJ1KB'
b = QueryDict(a)
with the given sample data, we have a QueryDict [Django-doc] that looks like:
>>> b
<QueryDict: {'orderId': ['570d3e38-d6486056e'], 'orderAmount': ['10.00'], 'referenceId': ['34344'], 'txStatus': ['SUCCESS'], 'txMsg': ['Transaction Successful'], 'txTime': ['2021-06-26 12:03:12'], 'signature': ['njtH5Dzmg6RJ1KB']}>
If you subscript this, then you always get the last item, so:
>>> b['orderId']
'570d3e38-d6486056e'
>>> b['orderAmount']
'10.00'
>>> b['orderId']
'570d3e38-d6486056e'
>>> b['txStatus']
'SUCCESS'
If you work with a view, you can also find this querydict of the URL with:
def my_view(request):
b = request.GET
# …
Pandas dataframe extracting value from JSON, which returned from as content from request.
import pandas as pd
import pandas as pd
import json
import requests
import ast
from pandas.io.json import json_normalize
df['response'] = df.URL.apply(lambda u: requests.get(u).content)
df.head()
b'{"error":0,"short":"http:\\/\\/192.168.42.72\\/ECyKY"}'
b'{"error":0,"short":"http:\\/\\/192.168.42.72\\/IsMgE"}'
When we use Python without Pandas, we can just use:
resp = requests.get(u)
y=resp.json()
print(y)
print(y['short'])
to store the short value as "http://192.168.42.72/ECyKY"
spend hours trying to get it work with Pandas without luck, any hint?
Instead of using response.get.content directly use response.get.json then use Series.str.get to extract the value corresponding to key short from the dictionary and then assign it to new column short:
df['response'] = df['URL'].apply(lambda u: requests.get(u).json())
df['short'] = df['response'].str.get('short')
# print(df)
response short
0 {'error': 0, 'short': 'http://192.168.42.72/EC... http://192.168.42.72/ECyKY
1 {'error': 0, 'short': 'http://192.168.42.72/Is... http://192.168.42.72/IsMgE
The format in the file looks like this
{ 'match' : 'a', 'score' : '2'},{......}
I've tried pd.DataFrame and I've also tried reading it by line but it gives me everything in one cell
I'm new to python
Thanks in advance
Expected result is a pandas dataframe
Try use json_normalize() function
Example:
from pandas.io.json import json_normalize
values = [{'match': 'a', 'score': '2'}, {'match': 'b', 'score': '3'}, {'match': 'c', 'score': '4'}]
df = json_normalize(values)
print(df)
Output:
If one line of your file corresponds to one JSON object, you can do the following:
# import library for working with JSON and pandas
import json
import pandas as pd
# make an empty list
data = []
# open your file and add every row as a dict to the list with data
with open("/path/to/your/file", "r") as file:
for line in file:
data.append(json.loads(line))
# make a pandas data frame
df = pd.DataFrame(data)
If there is more than only one JSON object on one row of your file, then you should find those JSON objects, for example here are two possible options. The solution with the second option would look like this:
# import all you will need
import pandas as pd
import json
from json import JSONDecoder
# define function
def extract_json_objects(text, decoder=JSONDecoder()):
pos = 0
while True:
match = text.find('{', pos)
if match == -1:
break
try:
result, index = decoder.raw_decode(text[match:])
yield result
pos = match + index
except ValueError:
pos = match + 1
# make an empty list
data = []
# open your file and add every JSON object as a dict to the list with data
with open("/path/to/your/file", "r") as file:
for line in file:
for item in extract_json_objects(line):
data.append(item)
# make a pandas data frame
df = pd.DataFrame(data)
I have a Pandas DataFrame with two columns – one with the filename and one with the hour in which it was generated:
File Hour
F1 1
F1 2
F2 1
F3 1
I am trying to convert it to a JSON file with the following format:
{"File":"F1","Hour":"1"}
{"File":"F1","Hour":"2"}
{"File":"F2","Hour":"1"}
{"File":"F3","Hour":"1"}
When I use the command DataFrame.to_json(orient = "records"), I get the records in the below format:
[{"File":"F1","Hour":"1"},
{"File":"F1","Hour":"2"},
{"File":"F2","Hour":"1"},
{"File":"F3","Hour":"1"}]
I'm just wondering whether there is an option to get the JSON file in the desired format. Any help would be appreciated.
The output that you get after DF.to_json is a string. So, you can simply slice it according to your requirement and remove the commas from it too.
out = df.to_json(orient='records')[1:-1].replace('},{', '} {')
To write the output to a text file, you could do:
with open('file_name.txt', 'w') as f:
f.write(out)
In newer versions of pandas (0.20.0+, I believe), this can be done directly:
df.to_json('temp.json', orient='records', lines=True)
Direct compression is also possible:
df.to_json('temp.json.gz', orient='records', lines=True, compression='gzip')
I think what the OP is looking for is:
with open('temp.json', 'w') as f:
f.write(df.to_json(orient='records', lines=True))
This should do the trick.
use this formula to convert a pandas DataFrame to a list of dictionaries :
import json
json_list = json.loads(json.dumps(list(DataFrame.T.to_dict().values())))
Try this one:
json.dumps(json.loads(df.to_json(orient="records")))
convert data-frame to list of dictionary
list_dict = []
for index, row in list(df.iterrows()):
list_dict.append(dict(row))
save file
with open("output.json", mode) as f:
f.write("\n".join(str(item) for item in list_dict))
To transform a dataFrame in a real json (not a string) I use:
from io import StringIO
import json
import DataFrame
buff=StringIO()
#df is your DataFrame
df.to_json(path_or_buf=buff,orient='records')
dfJson=json.loads(buff)
instead of using dataframe.to_json(orient = “records”)
use dataframe.to_json(orient = “index”)
my above code convert the dataframe into json format of dict like {index -> {column -> value}}
Here is small utility class that converts JSON to DataFrame and back: Hope you find this helpful.
# -*- coding: utf-8 -*-
from pandas.io.json import json_normalize
class DFConverter:
#Converts the input JSON to a DataFrame
def convertToDF(self,dfJSON):
return(json_normalize(dfJSON))
#Converts the input DataFrame to JSON
def convertToJSON(self, df):
resultJSON = df.to_json(orient='records')
return(resultJSON)