Related
I am trying to fetch a value of a static key in the list of dictionaries. My data looks like this:
[{'Code': 6861741, 'id': 351, 'name': ca-pub}, {'Code': divison, 'id': 567, 'name': Magazine Division}]
Desired output is:
Magazine Division
How do I fetch the key of name value from the second dictionary in the list?
Thank you.
I am using the pyflightdata library to search for flight stats. It returns json inside a list of dicts.
Here is an example of the first dictionary in the list after my query:
> flightlog = {'identification': {'number': {'default': 'KE504', 'alternative': 'None'}, 'callsign': 'KAL504', 'codeshare': 'None'}
, 'status': {'live': False, 'text': 'Landed 22:29', 'estimated': 'None', 'ambiguous': False, 'generic': {'status': {'text': 'landed', 'type': 'arrival', 'color': 'green', 'diverted': 'None'}
, 'eventTime': {'utc_millis': 1604611778000, 'utc_date': '20201105', 'utc_time': '2229', 'utc': 1604611778, 'local_millis': 1604615378000, 'local_date': '20201105', 'local_time': '2329', 'local': 1604615378}}}
, 'aircraft': {'model': {'code': 'B77L', 'text': 'Boeing 777-FEZ'}, 'registration': 'HL8075', 'country': {'name': 'South Korea', 'alpha2': 'KR', 'alpha3': 'KOR'}}
, 'airline': {'name': 'Korean Air', 'code': {'iata': 'KE', 'icao': 'KAL'}}
, 'airport': {'origin': {'name': 'London Heathrow Airport', 'code': {'iata': 'LHR', 'icao': 'EGLL'}, 'position': {'latitude': 51.471626, 'longitude': -0.467081, 'country': {'name': 'United Kingdom', 'code': 'GB'}, 'region': {'city': 'London'}}
, 'timezone': {'name': 'Europe/London', 'offset': 0, 'abbr': 'GMT', 'abbrName': 'Greenwich Mean Time', 'isDst': False}}, 'destination': {'name': 'Paris Charles de Gaulle Airport', 'code': {'iata': 'CDG', 'icao': 'LFPG'}, 'position': {'latitude': 49.012516, 'longitude': 2.555752, 'country': {'name': 'France', 'code': 'FR'}, 'region': {'city': 'Paris'}}, 'timezone': {'name': 'Europe/Paris', 'offset': 3600, 'abbr': 'CET', 'abbrName': 'Central European Time', 'isDst': False}}, 'real': 'None'}
, 'time': {'scheduled': {'departure_millis': 1604607300000, 'departure_date': '20201105', 'departure_time': '2115', 'departure': 1604607300, 'arrival_millis': 1604612700000, 'arrival_date': '20201105', 'arrival_time': '2245', 'arrival': 1604612700}, 'real': {'departure_millis': 1604609079000, 'departure_date': '20201105', 'departure_time': '2144', 'departure': 1604609079, 'arrival_millis': 1604611778000, 'arrival_date': '20201105', 'arrival_time': '2229', 'arrival': 1604611778}, 'estimated': {'departure': 'None', 'arrival': 'None'}, 'other': {'eta_millis': 1604611778000, 'eta_date': '20201105', 'eta_time': '2229', 'eta': 1604611778}}}
This dictionary is a huge, multi-nested, json mess and I am struggling to find a way to make it readable. I guess something like this:
identification number default KE504
alternative None
callsign KAL504
codeshare None
status live False
text Landed 22:29
Estimated None
ambiguous False
...
I am trying to turn it into a pandas DataFrame, with mixed results.
In this post it was explained that MultiIndex values have to be tuples, not dictionaries, so I used their example to convert my dictionary:
> flightlog_tuple = {(outerKey, innerKey): values for outerKey, innerDict in flightlog.items() for innerKey, values in innerDict.items()}
Which worked, up to a certain point.
df2 = pd.Series(flightlog_tuple)
gives the following output:
identification number {'default': 'KE504', 'alternative': 'None'}
callsign KAL504
codeshare None
status live False
text Landed 22:29
estimated None
ambiguous False
generic {'status': {'text': 'landed', 'type': 'arrival...
aircraft model {'code': 'B77L', 'text': 'Boeing 777-FEZ'}
registration HL8075
country {'name': 'South Korea', 'alpha2': 'KR', 'alpha...
airline name Korean Air
code {'iata': 'KE', 'icao': 'KAL'}
airport origin {'name': 'London Heathrow Airport', 'code': {'...
destination {'name': 'Paris Charles de Gaulle Airport', 'c...
real None
time scheduled {'departure_millis': 1604607300000, 'departure...
real {'departure_millis': 1604609079000, 'departure...
estimated {'departure': 'None', 'arrival': 'None'}
other {'eta_millis': 1604611778000, 'eta_date': '202...
dtype: object
Kind of what I was going for but some of the indexes are still in the column with values because there are so many levels. So I followed this explanation and tried to add more levels:
level_up = {(level1Key, level2Key, level3Key): values for level1Key, level2Dict in flightlog.items() for level2Key, level3Dict in level2Dict.items() for level3Key, values in level3Dict.items()}
df2 = pd.Series(level_up)
This code gives me AttributeError: 'str' object has no attribute 'items'. I don't understand why the first 2 indexes worked, but the others give an error.
I've tried other methods like MultiIndex.from_tuple or DataFrame.from_dict, but I can't get it to work.
This Dictionary is too complex as a beginner. I don't know what the right approach is. Maybe I am using DataFrames in the wrong way. Maybe there is an easier way to access the data that I am overlooking.
Any help would be much appreciated!
I'd like to convert API response into a pandas dataframe to make it easier to manipulate.
Below it's what I've tried so far:
import requests
import pandas as pd
URL = 'https://api.gleif.org/api/v1/lei-records?page[size]=10&page[number]=1&filter[entity.names]=*'
r = requests.get(URL, proxies=proxyDict)
x = r.json()
x
out:
{'meta': {'goldenCopy': {'publishDate': '2020-07-14T00:00:00Z'},
'pagination': {'currentPage': 1,
'perPage': 10,
'from': 1,
'to': 10,
'total': 1675786,
'lastPage': 167579}},
'links': {'first': 'https://api.gleif.org/api/v1/lei-records?filter%5Bentity.names%5D=%2A&page%5Bnumber%5D=1&page%5Bsize%5D=10',
'next': 'https://api.gleif.org/api/v1/lei-records?filter%5Bentity.names%5D=%2A&page%5Bnumber%5D=2&page%5Bsize%5D=10',
'last': 'https://api.gleif.org/api/v1/lei-records?filter%5Bentity.names%5D=%2A&page%5Bnumber%5D=167579&page%5Bsize%5D=10'},
'data': [{'type': 'lei-records',
'id': '254900RR9EUYHB7PI211',
'attributes': {'lei': '254900RR9EUYHB7PI211',
'entity': {'legalName': {'name': 'MedicLights Research Inc.',
'language': None},
'otherNames': [],
'transliteratedOtherNames': [],
'legalAddress': {'language': None,
'addressLines': ['300 Ranee Avenue'],
'addressNumber': None,
'addressNumberWithinBuilding': None,
'mailRouting': None,
'city': 'Toronto',
'region': 'CA-ON',
'country': 'CA',
'postalCode': 'M6A 1N8'},
'headquartersAddress': {'language': None,
'addressLines': ['76 Marble Arch Crescent'],
'addressNumber': None,
'addressNumberWithinBuilding': None,
'mailRouting': None,
'city': 'Toronto',
'region': 'CA-ON',
'country': 'CA',
'postalCode': 'M1R 1W9'},
'registeredAt': {'id': 'RA000079', 'other': None},
'registeredAs': '002185472',
'jurisdiction': 'CA-ON',
'category': None,
'legalForm': {'id': 'O90R', 'other': None},
'associatedEntity': {'lei': None, 'name': None},
'status': 'ACTIVE',
'expiration': {'date': None, 'reason': None},
'successorEntity': {'lei': None, 'name': None},
'otherAddresses': []},
'registration': {'initialRegistrationDate': '2020-07-13T21:09:50Z',
'lastUpdateDate': '2020-07-13T21:09:50Z',
'status': 'ISSUED',
'nextRenewalDate': '2021-07-13T21:09:50Z',
'managingLou': '5493001KJTIIGC8Y1R12',
'corroborationLevel': 'PARTIALLY_CORROBORATED',
'validatedAt': {'id': 'RA000079', 'other': None},
'validatedAs': '002185472'},
'bic': None},
'relationships': {'managing-lou': {'links': {'related': 'https://api.gleif.org/api/v1/lei-records/254900RR9EUYHB7PI211/managing-lou'}},
'lei-issuer': {'links': {'related': 'https://api.gleif.org/api/v1/lei-records/254900RR9EUYHB7PI211/lei-issuer'}},
'direct-parent': {'links': {'reporting-exception': 'https://api.gleif.org/api/v1/lei-records/254900RR9EUYHB7PI211/direct-parent-reporting-exception'}},
'ultimate-parent': {'links': {'reporting-exception': 'https://api.gleif.org/api/v1/lei-records/254900RR9EUYHB7PI211/ultimate-parent-reporting-exception'}}},
'links': {'self': 'https://api.gleif.org/api/v1/lei-records/254900RR9EUYHB7PI211'}},
{'type': 'lei-records',
'id': '254900F9XV2K6IR5TO93',
Then I tried to put it into pandas and gives me the following results
f = pd.DataFrame(x['data'])
f
type id attributes relationships links
0 lei-records 254900RR9EUYHB7PI211 {'lei': '254900RR9EUYHB7PI211', 'entity': {'le... {'managing-lou': {'links': {'related': 'https:... {'self': 'https://api.gleif.org/api/v1/lei-rec...
1 lei-records 254900F9XV2K6IR5TO93 {'lei': '254900F9XV2K6IR5TO93', 'entity': {'le... {'managing-lou': {'links': {'related': 'https:... {'self': 'https://api.gleif.org/api/v1/lei-rec...
2 lei-records 254900DIC0729LEXNL12 {'lei': '254900DIC0729LEXNL12', 'entity': {'le... {'managing-lou': {'links': {'related': 'https:... {'self': 'https://api.gleif.org/api/v1/lei-rec...
Which isn't the result expected. I even tried to read_json with below codes:
g = pd.read_json(x.text)
g
which gives me the error
AttributeError: 'dict' object has no attribute 'text'
the expected output should look like this:
lei entity.legalName.name entity.legalAddress.addressLines entity.legalAddress.city entity.legalAddress.postalcode status registration.status
254900RR9EUYHB7PI211 MedicLights Research Inc. 300 Ranee Avenue Toronto M6A 1N8 ACTIVE ISSUED
Thanks for anyone helping
Use json_normalize like:
pd.json_normalize(x['data'])
Here is another method to use the pandas to normalize the json file using pandas.io.json.json_normalize from pandas.io.json library.
How to normalize json correctly by Python Pandas
I've got some data out of the Pocket API and the resulting JSON called list has some nested JSON within it. Sample below
{'complete': 1,
'error': None,
'list': {'1992211110': {'authors': {'8683682': {'author_id': '8683682',
'item_id': '1992211110',
'name': 'Robert Kuttner',
'url': 'http://www.nybooks.com/contributors/robert-kuttner/'}},
'excerpt': 'What a splendid era this was going to be, with one remaining superpower spreading capitalism and liberal democracy around the world. Instead, democracy and capitalism seem increasingly incompatible.',
'favorite': '0',
'given_title': '',
'given_url': 'http://nyrevinc.cmail20.com/t/y-l-klpdut-jduhlyklkl-d/',
'has_image': '0',
'has_video': '0',
'is_article': '1',
'is_index': '0',
'item_id': '1992211110',
'resolved_id': '1977788178',
'resolved_title': 'The Man from Red Vienna',
'resolved_url': 'http://www.nybooks.com/articles/2017/12/21/karl-polanyi-man-from-red-vienna/',
'sort_id': 6,
'status': '0',
'time_added': '1520132694',
'time_favorited': '0',
'time_read': '0',
'time_updated': '1520140351',
'word_count': '4009'},
I've managed to get the whole results into a dataframe however there is some nesting of what looks like a dictionary called authors? I've managed to split this out into dictionaries with an index but can't figure out how to get that into a dataframe. Sample below of authors:
{1: {'authors': {'8683682': {'author_id': '8683682',
'item_id': '1992211110',
'name': 'Robert Kuttner',
'url': 'http://www.nybooks.com/contributors/robert-kuttner/'}}},
2: {'authors': {'53525958': {'author_id': '53525958',
'item_id': '2086463428',
'name': 'Adam Tooze',
'url': 'http://www.nybooks.com/contributors/adam-tooze/'}}},
3: {'authors': {'3490600': {'author_id': '3490600',
'item_id': '2090266893',
'name': 'Adam Liaw',
'url': ''}}},
4: {'authors': {'75929933': {'author_id': '75929933',
'item_id': '2091894678',
'name': 'umair haque',
'url': 'https://eand.co/#umairh'}}},
5: {'authors': {'61177521': {'author_id': '61177521',
'item_id': '2092663780',
'name': 'Annalisa Merelli',
'url': 'https://qz.com/author/amerelliqz/'}}},
6: {'authors': {'52268529': {'author_id': '52268529',
'item_id': '2092922221',
'name': 'Aditya Chakrabortty',
'url': 'https://www.theguardian.com/profile/adityachakrabortty'}}},
7: {'authors': {'28083': {'author_id': '28083',
'item_id': '2096294305',
'name': 'Alana Semuels',
'url': ''}}},
8: {'authors': {'185472': {'author_id': '185472',
'item_id': '2097100251',
'name': 'TIM KREIDER',
'url': ''}}},
9: {'authors': {'2771923': {'author_id': '2771923',
'item_id': '2098788948',
'name': 'Richard Bernstein',
'url': 'http://www.nybooks.com/contributors/richard-bernstein/'}}},
10: {'authors': {'61111044': {'author_id': '61111044',
'item_id': '2102383890',
'name': 'Ephrat Livni',
'url': 'https://qz.com/author/livniqz/'}}}}
Any help much appreciated, I am very new to python and pandas.
Here is a proposal. You need to filter your secondary dictionary in order to ingest it into a dataframe.
input is your second dictionary.
authors_filtered = [v for v in zip(*[dict(item).values() for item in [input[i]['authors'] for i in input]])][0]
output = pd.DataFrame.from_dict(list(authors_filtered))
I am trying to read the specific tag within a JSON file with python that I got from API and if they were fixed, I would have had no problem, but it seems that sometimes the elements jump around I can't go after the "sequence" number, but have to use the name to locate it. The name should stay consistent.
Here are the two types that I have seen so far, but I am sure there could be more variation, so instead of relying on the
heroID = data[count]['player'][0]['data'][8]['number']
to extract the value, I would much rather, look for the location of "HeroID" and read that into it variable.
longer one
[
{'id': 'HeroBattleTag', 'string': 'TFYoDa#1456'},
{'id': 'GameAccount', 'number': 10519139},
{'id': 'HeroClass', 'string': 'monk'},
{'id': 'HeroGender', 'string': 'f'},
{'id': 'HeroLevel', 'number': 70},
{'id': 'ParagonLevel', 'number': 1212},
{'id': 'HeroClanTag', 'string': 'Sc'},
{'id': 'ClanName', 'string': 'Super CasuaI'},
{'id': 'HeroId', 'number': 95443875}
]
shorter one
[
{'id': 'HeroBattleTag', 'string': 'Michael#1920'},
{'id': 'GameAccount', 'number': 96532923},
{'id': 'HeroClass', 'string': 'monk'},
{'id': 'HeroGender', 'string': 'f'},
{'id': 'HeroLevel', 'number': 70},
{'id': 'ParagonLevel', 'number': 1062},
{'id': 'HeroId', 'number': 95441675}
]
I think my question was flawed from beginning, but I was able to write the code by building an iteration over a child json record, while already iterating over the parent one.
count = 0
for i in data:
character = []
rank = data[count]['order'] #ladderRank
accountId = data[count]['player'][0]['accountId'] #accountID
c = 0
for k in data[count]['player'][0]['data']:
if k['id'] == 'HeroId':
heroID = k['number']
pprint.pprint(heroID)
break
else:
c = c + 1
if c > len(data[0]['player'][0]['data']):
break