Related
I would like to turn a JSON data structure into a pandas dataframe. My data is retrieved from OpenWeatherMap. The resulting JSON contain multiple nested dictionaries which contain weather data on cities, divided by the URL where the data are retrieved from. Here the last two lines of the JSON, called json_data:
http://api.openweathermap.org/data/2.5/weather?units=Imperial&APPID=a13759df887d2de294c2c7adef912758&q=new norfolk:
{'coord': {'lon': 147.0587, 'lat': -42.7826}, 'weather': [{'id': 803, 'main': 'Clouds', 'description': 'broken clouds', 'icon': '04d'}], 'base': 'stations', 'main': {'temp': 46.04, 'feels_like': 44.29, 'temp_min': 43.92, 'temp_max': 51.93, 'pressure': 1000, 'humidity': 74}, 'visibility': 10000, 'wind': {'speed': 4, 'deg': 319, 'gust': 15.99}, 'clouds': {'all': 77}, 'dt': 1652657623, 'sys': {'type': 2, 'id': 2031307, 'country': 'AU', 'sunrise': 1652649449, 'sunset': 1652684337}, 'timezone': 36000, 'id': 2155415, 'name': 'New Norfolk', 'cod': 200}
http://api.openweathermap.org/data/2.5/weather?units=Imperial&APPID=a13759df887d2de294c2c7adef912758&q=fortuna:
{'coord': {'lon': -124.1573, 'lat': 40.5982}, 'weather': [{'id': 801, 'main': 'Clouds', 'description': 'few clouds', 'icon': '02d'}], 'base': 'stations', 'main': {'temp': 66.67, 'feels_like': 66.18, 'temp_min': 64.98, 'temp_max': 67.93, 'pressure': 1017, 'humidity': 67}, 'visibility': 10000, 'wind': {'speed': 17.27, 'deg': 360}, 'clouds': {'all': 20}, 'dt': 1652657623, 'sys': {'type': 2, 'id': 2040243, 'country': 'US', 'sunrise': 1652619589, 'sunset': 1652671580}, 'timezone': -25200, 'id': 5563839, 'name': 'Fortuna', 'cod': 200}
However, when I turn the JSON into a Pandas Dataframe, only the last dictionary goes into the dataframe.
Here is my code:
pd.set_option('display.max_columns', None)
pd.json_normalize(json_data)
Here is the result (I cannot copy the panda dataframe directly without losing formatting).
Why is only the last dictionary turned into a dataframe? How can I get a multiple-line dataframe?
If you're only seeing one line in your dataframe, you are probably overwriting your json_data with the last value.
Besides, you can normalize the weather column separately and join it to the rest:
json_data = [
{'coord': {'lon': 147.0587, 'lat': -42.7826}, 'weather': [{'id': 803, 'main': 'Clouds', 'description': 'broken clouds', 'icon': '04d'}], 'base': 'stations', 'main': {'temp': 46.04, 'feels_like': 44.29, 'temp_min': 43.92, 'temp_max': 51.93, 'pressure': 1000, 'humidity': 74}, 'visibility': 10000, 'wind': {'speed': 4, 'deg': 319, 'gust': 15.99}, 'clouds': {'all': 77}, 'dt': 1652657623, 'sys': {'type': 2, 'id': 2031307, 'country': 'AU', 'sunrise': 1652649449, 'sunset': 1652684337}, 'timezone': 36000, 'id': 2155415, 'name': 'New Norfolk', 'cod': 200},
{'coord': {'lon': -124.1573, 'lat': 40.5982}, 'weather': [{'id': 801, 'main': 'Clouds', 'description': 'few clouds', 'icon': '02d'}], 'base': 'stations', 'main': {'temp': 66.67, 'feels_like': 66.18, 'temp_min': 64.98, 'temp_max': 67.93, 'pressure': 1017, 'humidity': 67}, 'visibility': 10000, 'wind': {'speed': 17.27, 'deg': 360}, 'clouds': {'all': 20}, 'dt': 1652657623, 'sys': {'type': 2, 'id': 2040243, 'country': 'US', 'sunrise': 1652619589, 'sunset': 1652671580}, 'timezone': -25200, 'id': 5563839, 'name': 'Fortuna', 'cod': 200}
]
pd.set_option('display.max_columns', None)
df = pd.json_normalize(json_data)
df = df.loc[:, df.columns!='weather'].join(pd.json_normalize(json_data, record_path='weather', record_prefix='weather.'))
print(df)
Output:
base visibility dt timezone id name cod \
0 stations 10000 1652657623 36000 2155415 New Norfolk 200
1 stations 10000 1652657623 -25200 5563839 Fortuna 200
coord.lon coord.lat main.temp main.feels_like main.temp_min \
0 147.0587 -42.7826 46.04 44.29 43.92
1 -124.1573 40.5982 66.67 66.18 64.98
main.temp_max main.pressure main.humidity wind.speed wind.deg \
0 51.93 1000 74 4.00 319
1 67.93 1017 67 17.27 360
wind.gust clouds.all sys.type sys.id sys.country sys.sunrise \
0 15.99 77 2 2031307 AU 1652649449
1 NaN 20 2 2040243 US 1652619589
sys.sunset weather.id weather.main weather.description weather.icon
0 1652684337 803 Clouds broken clouds 04d
1 1652671580 801 Clouds few clouds 02d
As you can see, both lines are in the dataframe
I am trying to parse a json and insert the results in pandas dataframe.
My json looks like
{'result': {'data': [{'dimensions': [{'id': '219876173',
'name': 'Our great product'},
{'id': '2021-03-01', 'name': ''}],
'metrics': [41, 4945]},
{'dimensions': [{'id': '219876173',
'name': 'Our great product'},
{'id': '2021-03-02', 'name': ''}],
'metrics': [31, 2645]},
{'dimensions': [{'id': '219876166',
'name': 'Our awesome product'},
{'id': '2021-03-01', 'name': ''}], ....
So far, I've managed to get to this point:
[{'dimensions': [{'id': '219876173',
'name': 'Our great product'},
{'id': '2021-03-01', 'name': ''}],
'metrics': [41, 4945]},
{'dimensions': [{'id': '219876173',
'name': 'Our great product'},
{'id': '2021-03-02', 'name': ''}],
'metrics': [31, 2645]},
However, when I place it in Pandas I get
dimensions metrics
0 [{'id': '219876173', 'name': 'Our great product... [41, 4945]
1 [{'id': '219876173', 'name': 'Our great product... [31, 2645]
2 [{'id': '219876166', 'name': 'Our awesome product... [27, 2475]
I can now manually split the results in columns using some lambdas
df = pd.io.json.json_normalize(r.json().get('result').get('data'))
df['delivered_units'] = df['metrics'].apply(lambda x: x[0])
df['revenue'] = df['metrics'].apply(lambda x: x[1])
df['name'] = df['dimensions'].apply(lambda x: x[0])
df['sku'] = df['name'].apply(lambda x: x['name'])
Is there a better way to parse json directly without lambdas?
Look into flatten_json:
data = {'result': {'data': [{'dimensions': [{'id': '219876173',
'name': 'Our great product'},
{'id': '2021-03-01', 'name': ''}],
'metrics': [41, 4945]},
{'dimensions': [{'id': '219876173',
'name': 'Our great product'},
{'id': '2021-03-02', 'name': ''}],
'metrics': [31, 2645]},
{'dimensions': [{'id': '219876166',
'name': 'Our awesome product'},
{'id': '2021-03-01', 'name': ''}]}]}}
from flatten_json import flatten
dic_flattened = (flatten(d, '.') for d in data['result']['data'])
df = pd.DataFrame(dic_flattened)
dimensions.0.id dimensions.0.name dimensions.1.id dimensions.1.name metrics.0 metrics.1
0 219876173 Our great product 2021-03-01 41.0 4945.0
1 219876173 Our great product 2021-03-02 31.0 2645.0
2 219876166 Our awesome product 2021-03-01 NaN NaN
I want to break down a column in a dataframe into multiple columns.
I have a dataframe with the following configuration:
GroupId,SubGroups,Type,Name
-4781505553015217258,"{'GroupId': -732592932641342965, 'SubGroups': [], 'Type': 'DefaultSite', 'Name': 'Default Site'}",OrganisationGroup,CompanyXYZ
-4781505553015217258,"{'GroupId': 8123255835936628631, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'MERCEDES BENZ'}",OrganisationGroup,CompanyXYZ
-4781505553015217258,"{'GroupId': -1785570219922840611, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'VOLVO'}",OrganisationGroup,CompanyXYZ
-4781505553015217258,"{'GroupId': -3670461095557699088, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'SCANIA'}",OrganisationGroup,CompanyXYZ
-4781505553015217258,"{'GroupId': 8683757391859854416, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'DRIVERS'}",OrganisationGroup,CompanyXYZ
-4781505553015217258,"{'GroupId': -8066654520755643389, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'X - DECOMMISSION'}",OrganisationGroup,CompanyXYZ
-4781505553015217258,"{'GroupId': 4177323092254043025, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'X-INSTALLATION'}",OrganisationGroup,CompanyXYZ
-4781505553015217258,"{'GroupId': -6088426161802844604, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'FORD'}",OrganisationGroup,CompanyXYZ
-4781505553015217258,"{'GroupId': 8512440039365422841, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'HEAVY VEHICLES'}",OrganisationGroup,CompanyXYZ
I want to create a new dataframe where the SubGroups column is broken into its components. Note that the names inside SubGroups column are prefixed with SubGroups_
GroupId, SubGroup_GroupId, SubGroup_SubGroups, SubGroup_Type, SubGroup_Name, Type, Name
-4781505553015217258, -732592932641342965, [], 'DefaultSite', 'Default Site', OrganisationGroup, CompanyXYZ
-4781505553015217258, 8123255835936628631, [], 'SiteGroup', 'MERCEDES BENZ', OrganisationGroup, CompanyXYZ
I have tried the following code:
for row in AllSubGroupsDF.itertuples():
newDF= newDF.append((pd.io.json.json_normalize(row.SubGroups)))
But it returns
GroupId,SubGroups,Type,Name
-732592932641342965,[],DefaultSite,Default Site
8123255835936628631,[],SiteGroup,MERCEDES BENZ
-1785570219922840611,[],SiteGroup,VOLVO
-3670461095557699088,[],SiteGroup,SCANIA
8683757391859854416,[],SiteGroup,DRIVERS
-8066654520755643389,[],SiteGroup,X - DECOMMISSION
4177323092254043025,[],SiteGroup,X-INSTALLATION
-6088426161802844604,[],SiteGroup,FORD
8512440039365422841,[],SiteGroup,HEAVY VEHICLES
I would like to have it all end up in one dataframe but I'm not sure how. Please help?
You can try using ast package:-
import pandas as pd
import ast
data = [[-4781505553015217258,"{'GroupId': -732592932641342965, 'SubGroups': [], 'Type': 'DefaultSite', 'Name': 'Default Site'}","OrganisationGroup","CompanyXYZ"],
[-4781505553015217258,"{'GroupId': 8123255835936628631, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'MERCEDES BENZ'}","OrganisationGroup","CompanyXYZ"],
[-4781505553015217258,"{'GroupId': -1785570219922840611, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'VOLVO'}","OrganisationGroup","CompanyXYZ"],
[-4781505553015217258,"{'GroupId': -3670461095557699088, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'SCANIA'}","OrganisationGroup","CompanyXYZ"],
[-4781505553015217258,"{'GroupId': 8683757391859854416, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'DRIVERS'}","OrganisationGroup","CompanyXYZ"],
[-4781505553015217258,"{'GroupId': -8066654520755643389, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'X - DECOMMISSION'}","OrganisationGroup","CompanyXYZ"],
[-4781505553015217258,"{'GroupId': 4177323092254043025, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'X-INSTALLATION'}","OrganisationGroup","CompanyXYZ"],
[-4781505553015217258,"{'GroupId': -6088426161802844604, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'FORD'}","OrganisationGroup","CompanyXYZ"],
[-4781505553015217258,"{'GroupId': 8512440039365422841, 'SubGroups': [], 'Type': 'SiteGroup', 'Name': 'HEAVY VEHICLES'}","OrganisationGroup","CompanyXYZ"]]
df = pd.DataFrame(data,columns=["GroupId","SubGroups","Type","Name"])
df["SubGroup_GroupId"] = df["SubGroups"].map(lambda x: ast.literal_eval(x)["GroupId"])
df["SubGroup_SubGroups"] = df["SubGroups"].map(lambda x: ast.literal_eval(x)["SubGroups"])
df["SubGroup_Type"] = df["SubGroups"].map(lambda x: ast.literal_eval(x)["Type"])
df["SubGroup_Name"] = df["SubGroups"].map(lambda x: ast.literal_eval(x)["Name"])
df
Hope this helps!!
I'm trying to use the peopledata API at peopledatalabs.com to retrieve data. I am using the sample python code located at https://docs.peopledatalabs.com/docs/quickstart
which is:
import requests
API_KEY = # YOUR API KEY
###
pdl_url = "https://api.peopledatalabs.com/v4/person?api_key={}&".format(API_KEY)
param_string = "name=sean thorne&company=peopledatalabs.com"
json_response = requests.get(pdl_url + param_string).json()
# OR
pdl_url = "https://api.peopledatalabs.com/v4/person"
params = {
"api_key": API_KEY,
"name": ["sean thorne"],
"company": ["peopledatalabs.com"]
}
json_response = requests.get(pdl_url, params=params).json()
json_response returns:
{'status': 200,
'likelihood': 5,
'data': {'id': 'yj5RUCSORrirXf2sf3gR',
'skills': [{'name': 'social media'},
{'name': 'strategic partnerships'},
{'name': 'public speaking'},
{'name': 'sales'},
{'name': 'photoshop'},
{'name': 'networking'},
{'name': 'mobile marketing'},
{'name': 'start ups'},
{'name': 'business development'},
{'name': 'fundraising'},
{'name': 'seo'},
{'name': 'strategy'},
{'name': 'idea generation'},
{'name': 'enterprise technology sales'},
{'name': 'entrepreneurship'},
{'name': 'social networking'},
{'name': 'creative strategy'},
{'name': 'time management'},
{'name': 'product management'},
{'name': 'social media marketing'},
{'name': 'css'},
{'name': 'https'},
{'name': 'saas'},
{'name': 'management'},
{'name': 'project management'},
{'name': 'public relations'},
{'name': 'marketing communications'},
{'name': 'sales/marketing and strategic partnerships'},
{'name': 'marketing strategy'},
{'name': 'mobile devices'},
{'name': 'installation'},
{'name': 'company culture'},
{'name': 'strategic vision'},
{'name': 'html5'},
{'name': 'hiring'}],
'industries': [{'name': 'computer software', 'is_primary': True}],
'interests': [{'name': 'location based services'},
{'name': 'mobile'},
{'name': 'social media'},
{'name': 'colleges'},
{'name': 'university students'},
{'name': 'consumer internet'},
{'name': 'college campuses'}],
'profiles': [{'network': 'linkedin',
'ids': ['145991517'],
'clean': 'linkedin.com/in/seanthorne',
'aliases': [],
'username': 'seanthorne',
'is_primary': True,
'url': 'http://www.linkedin.com/in/seanthorne'},
{'network': 'linkedin',
'ids': [],
'clean': 'linkedin.com/in/sean-thorne-9b9a8540',
'aliases': ['linkedin.com/pub/sean-thorne/40/a85/9b9'],
'username': 'sean-thorne-9b9a8540',
'is_primary': False,
'url': 'http://www.linkedin.com/in/sean-thorne-9b9a8540'},
{'network': 'twitter',
'ids': [],
'clean': 'twitter.com/seanthorne5',
'aliases': [],
'username': 'seanthorne5',
'url': 'http://www.twitter.com/seanthorne5'},
{'network': 'angellist',
'ids': [],
'clean': 'angel.co/475041',
'aliases': [],
'username': '475041',
'url': 'http://www.angel.co/475041'}],
'emails': [{'address': 'sthorne#uoregon.edu',
'type': None,
'sha256': 'e206e6cd7fa5f9499fd6d2d943dcf7d9c1469bad351061483f5ce7181663b8d4',
'domain': 'uoregon.edu',
'local': 'sthorne'},
{'address': 'sean#peopledatalabs.com',
'type': 'current_professional',
'sha256': '138ea1a7076bb01889af2309de02e8b826c27f022b21ea8cf11aca9285d5a04e',
'domain': 'peopledatalabs.com',
'local': 'sean'}],
'phone_numbers': [{'E164': '+14155688415',
'number': '+14155688415',
'type': None,
'country_code': '1',
'national_number': '4155688415',
'area_code': '415'}],
'birth_date_fuzzy': '1990',
'birth_date': None,
'gender': 'male',
'primary': {'job': {'company': {'name': 'people data labs',
'founded': '2015',
'industry': 'information technology and services',
'location': {'locality': 'san francisco',
'region': 'california',
'country': 'united states'},
'profiles': ['linkedin.com/company/peopledatalabs',
'linkedin.com/company/1640694639'],
'website': 'peopledatalabs.com',
'size': '11-50'},
'locations': [],
'end_date': None,
'start_date': '2015-03',
'title': {'levels': ['owner'],
'name': 'co-founder',
'functions': ['co founder']},
'last_updated': '2019-05-01'},
'location': {'name': 'san francisco, california, united states',
'locality': 'san francisco',
'region': 'california',
'country': 'united states',
'last_updated': '2019-01-01',
'continent': 'north america'},
'name': {'first_name': 'sean',
'middle_name': None,
'last_name': 'thorne',
'clean': 'sean thorne'},
'industry': 'computer software',
'personal_emails': [],
'linkedin': 'linkedin.com/in/seanthorne',
'work_emails': ['sean#peopledatalabs.com'],
'other_emails': ['sthorne#uoregon.edu']},
'names': [{'first_name': 'sean',
'last_name': 'thorne',
'suffix': None,
'middle_name': None,
'middle_initial': None,
'name': 'sean thorne',
'clean': 'sean thorne',
'is_primary': True}],
'locations': [{'name': 'san francisco, california, united states',
'locality': 'san francisco',
'region': 'california',
'subregion': 'city and county of san francisco',
'country': 'united states',
'continent': 'north america',
'type': 'locality',
'geo': '37.77,-122.41',
'postal_code': None,
'zip_plus_4': None,
'street_address': None,
'address_line_2': None,
'most_recent': True,
'is_primary': True,
'last_updated': '2019-01-01'}],
'experience': [{'company': {'name': 'hallspot',
'size': '1-10',
'founded': '2013',
'industry': 'computer software',
'location': {'locality': 'portland',
'region': 'oregon',
'country': 'united states'},
'profiles': ['linkedin.com/company/hallspot',
'twitter.com/hallspot',
'crunchbase.com/organization/hallspot',
'linkedin.com/company/3019184'],
'website': 'hallspot.com'},
'locations': [],
'end_date': '2015-02',
'start_date': '2012-08',
'title': {'levels': ['owner'],
'name': 'co-founder',
'functions': ['co founder']},
'type': None,
'is_primary': False,
'most_recent': False,
'last_updated': None},
{'company': {'name': 'people data labs',
'size': '11-50',
'founded': '2015',
'industry': 'information technology and services',
'location': {'locality': 'san francisco',
'region': 'california',
'country': 'united states'},
'profiles': ['linkedin.com/company/peopledatalabs',
'linkedin.com/company/1640694639'],
'website': 'peopledatalabs.com'},
'locations': [],
'end_date': None,
'start_date': '2015-03',
'title': {'levels': ['owner'],
'name': 'co-founder',
'functions': ['co founder']},
'type': None,
'is_primary': True,
'most_recent': True,
'last_updated': '2019-05-01'}],
'education': [{'school': {'name': 'university of oregon',
'type': 'post-secondary institution',
'location': 'eugene, oregon, united states',
'profiles': ['linkedin.com/edu/university-of-oregon-19207',
'facebook.com/universityoforegon',
'twitter.com/uoregon'],
'website': 'uoregon.edu'},
'end_date': '2014',
'start_date': '2010',
'gpa': None,
'degrees': [],
'majors': ['entrepreneurship'],
'minors': [],
'locations': []}]},
'dataset_version': '7.3'}
While trying to get the phone_numbers field, I have tried:
print(json_response["phone_numbers"])
and got the error code:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-132-2acb0f9f59c5> in <module>()
----> 1 json_response["phone_numbers"]
KeyError: 'phone_numbers'
I am hoping to get the number '+14155688415' as my result
print(json_response["data"]["phone_numbers"])
When dealing with lots of data like that, JSONLint is a good resource to stay organized.
On a Plotly Dash dashboard, I am unable to get my graphs to stretch as the screen grows then be side by side in one row once the screen get's big enough. If I use style={"float:right"} and style={"float:left"} with each graph, it will work, but the graphs will not stretch with the screen anymore. I have attached a photo of the resulting plots. The plots are over/under. I want them side by side with a large browser window, then to shrink with a medium browser window and be over/under with a small browser window.
app = dash.Dash()
app.layout = html.Div([
dcc.Checklist(
id='input',
options=[
{'label': 'Astoria', 'value': 'AST'},
{'label': 'Warrenton', 'value': 'WAR'},
{'label': 'Seaside', 'value': 'SEA'}
],
values=['AST', 'WAR', 'SEA'],
),
html.Div(className='row',
children=[
html.Div(
dcc.Graph(id='value-index'),
className='col s12 m6',
),
html.Div(
dcc.Graph(id='rental-index'),
className='col s12 m6',
)
],
)
])
#app.callback(
Output('value-index', 'figure'),
[Input(component_id='input', component_property='values')]
)
def update_graph(input_data):
return {
'data': [
{'x': astoriaValueIndex.index, 'y': astoriaValueIndex.Value, 'type': 'line', 'name': 'Astoria'},
{'x': warrentonValueIndex.index, 'y': warrentonValueIndex.Value, 'type': 'line', 'name': 'Warrenton'},
{'x': seasideValueIndex.index, 'y': seasideValueIndex.Value, 'type': 'line', 'name': 'Seaside'},
],
'layout': {
'title': 'Zillow Value Index'
}
}
#app.callback(
Output('rental-index', 'figure'),
[Input(component_id='input', component_property='values')]
)
def update_graph(input_data):
return {
'data': [
{'x': astoriaRentalIndex.index, 'y': astoriaRentalIndex.Value, 'type': 'line', 'name': 'Astoria'},
{'x': warrentonRentalIndex.index, 'y': warrentonRentalIndex.Value, 'type': 'line', 'name': 'Warrenton'},
{'x': seasideRentalIndex.index, 'y': seasideRentalIndex.Value, 'type': 'line', 'name': 'Seaside'},
],
'layout': {
'title': 'Zillow Rental Index'
}
}
[enter image description here][1]
if __name__ == '__main__':
app.run_server(debug=True)
Did you try with CSS option : display : Flex; ?
html.Div(className='row',
style = {'display' : 'flex'},
children=[
html.Div(
dcc.Graph(id='value-index'),
className='col s12 m6',
),
html.Div(
dcc.Graph(id='rental-index'),
className='col s12 m6',
)
],
Normally, this should update the page according to the way you want.