Simple Json decoding with SimpleJSON - Python - json

Ive just started learning python and Im having a go at using a google api. But I hit a brick wall trying to parse the JSON with simplejson.
How do I go about pulling single values (ie product or brand fields) out of this mess below
{'currentItemCount': 25, 'etag': '"izYJutfqR9tRDg1H4X3fGx1UiCI/hqqZ6pMwV1-CEu5NSqfJO0Ix-gs"', 'id': 'tag:google.com,2010:shopping/products', 'items': [{'id': 'tag:google.com,2010:shopping/products/1196682/8186421160532506003',
'kind': 'shopping#product',
'product': {'author': {'accountId': '1196682',
'name': "Dillard's"},
'brand': 'Merrell',
'condition': 'new',
'country': 'US',
'creationTime': '2011-03-10T08:11:08.000Z',
'description': u'Merrell\'s "Trail Glove" barefoot running shoe lets your feet follow their natural i$
'googleId': '8186421160532506003',
'gtin': '00797240569847',
'images': [{'link': 'http://dimg.dillards.com/is/image/DillardsZoom/03528718_zi_amazon?$product$'}],
'inventories': [{'availability': 'inStock',
'channel': 'online',
'currency': 'USD',
'price': 110.0}],
'language': 'en',
'link': 'http://www.dillards.com/product/Merrell-Mens-Trail-Glove-Barefoot-Running-Shoes_301_-1_301_5$
'modificationTime': '2011-05-25T07:42:51.000Z',
'title': 'Merrell Men\'s "Trail Glove" Barefoot Running Shoes'},
'selfLink': 'https://www.googleapis.com/shopping/search/v1/public/products/1196682/gid/8186421160532506003?alt=js$

The JSON you've pasted in the question is not valid. But when you fixed that here's how to use simplejson:
import simplejson as json
your_response_body = '["foo", {"bar":["baz", null, 1.0, 2]}]'
obj = json.loads(your_response_body)
print(obj[1]['bar'])
And a link to the documentation.

Related

How to handle the variable size json file in python to create DataFrame using pandas

I am trying to build a DataFrame using pandas but I am not able to handle the case when I have the variable size of JSON chunks I am getting.
eg:
1st chunk:
{'ad': 0,
'country': 'US',
'ver': '1.0',
'adIdType': 2,
'adValue': '5',
'data': {'eventId': 99,
'clickId': '',
'eventType': 'PURCHASEMADE',
'tms': '2019-12-25T09:57:04+0000',
'productDetails': {'currency': 'DLR',
'productList': [
{'segment': 'Girls',
'vertical': 'Fashion Jewellery',
'brickname': 'Traditional Jewellery',
'price': 8,
'quantity': 10}]},
'transactionId': '1254'},
'appName': 'xer.tt',
'appId': 'XR',
'sdkVer': '1.0.0',
'language': 'en',
'tms': '2022-04-25T09:57:04+0000',
'tid': '124'}
2nd chunk:
{'ad': 0,
'country': 'US',
'ver': '1.0',
'adIdType': 2,
'adValue': '78',
'data': {'eventId': 7,
'clickId': '',
'eventType': 'PURCHASEMADE',
'tms': '20219-02-25T09:57:04+0000',
'productDetails': {'currency': 'DLR',
'productList': [{'segment': 'Boys',
'vertical': 'Fashion',
'brickname': 'Casuals',
'price': 10,
'quantity': 5},
{'segment': 'Girls',
'vertical': 'Fashion Jewellery',
'brickname': 'Traditional Jewellery',
'price': 8,
'quantity': 10}]},
'transactionId': '3258'},
'appName': 'xer.tt',
'appId': 'XR',
'sdkVer': '1.0.0',
'language': 'en',
'tms': '2029-02-25T09:57:04+0000',
'tid': '124'}
Now in the ProductDetails the number of products are getting changes, in the first chunk we have only 1 product listed and it's detailed but in the 2nd chunk, we have 2 products listed and it's detailed, for further chunks we can have ANY number of products for other chunks also. (i.e. chunks~Records)
I tried doing that by writing some python scripts but was not able to come to any good solution.
PS: If any further detail is required please let me know in the comments.
Thanks!
What you can do, is use pd.json_normalize and have the most "inner" dictionary as your record_path and all other data you are interested in as your meta . Here is an in-depth example how you could construct that: pandas.io.json.json_normalize with very nested json
In your case, that would for example be (for a single object):
df = pd.json_normalize(obj,
record_path=["data", "productDetails", "productList"],
meta=([
["data", "productDetails", "currency"],
["data", "transactionId"],
["data", "clickId"],
["data", "eventType"],
["data", "tms"],
"ad",
"country"
])
)

converting json into pandas dataframe

I have JSON output that I would like to convert to pandas dataframe. I downloaded from a website via HTTPS and utilizing an API key. thanks much. here is what I coded:
json_data = vehicle_miles_traveled.json()
print(json_data)
{'request': {'command': 'series', 'series_id': 'STEO.MVVMPUS.A'}, 'series': [{'series_id': 'STEO.MVVMPUS.A', 'name': 'Vehicle Miles Traveled, Annual', 'units': 'million miles/day', 'f': 'A', 'description': 'Includes gasoline and diesel fuel vehicles', 'copyright': 'None', 'source': 'U.S. Energy Information Administration (EIA) - Short Term Energy Outlook', 'geography': 'USA', 'start': '1990', 'end': '2023', 'lastHistoricalPeriod': '2021', 'updated': '2022-03-08T12:39:35-0500', 'data': [['2023', 9247.0281671], ['2022', 9092.4575671], ['2021', 8846.1232877], ['2020', 7933.3907104], ['2019', 8936.3589041], ['2018', 8877.6027397], ['2017', 8800.9479452], ['2016', 8673.2431694], ['2015', 8480.4712329], ['2014', 8289.4684932], ['2013', 8187.0712329], ['2012', 8110.8387978], ['2011', 8083.2931507], ['2010', 8129.4958904], ['2009', 8100.7205479], ['2008', 8124.3387978], ['2007', 8300.8794521], ['2006', 8257.8520548], ['2005', 8190.2136986], ['2004', 8100.5163934], ['2003', 7918.4136986], ['2002', 7823.3123288], ['2001', 7659.2054795], ['2000', 7505.2622951], ['1999', 7340.9808219], ['1998', 7192.7780822], ['1997', 7014.7205479], ['1996', 6781.9699454], ['1995', 6637.7369863], ['1994', 6459.1452055], ['1993', 6292.3424658], ['1992', 6139.7595628], ['1991', 5951.2712329], ['1990', 5883.5643836]]}]}
It hugely depends on your final goal. You could add all meta-data in a dataframe if you want to. I assume that you are interested in reading the data field into a dataframe.
We can just get those fields by accessing:
data = json_data['series'][0]['data']
# and pass them to the dataframe constructor. We can specify the column names as well!
df = pd.DataFrame(data, columns=['year', 'other_col_name'])

How to parse nested JSON file in Pandas

I'm trying to transform a JSON file generated by the Day One Journal to a text file using Python but hit a brick wall.
This is broadly the format:
{'metadata': {'version': '1.0'},
'entries': [{'richText': '{"meta":{"version":1,"small-lines-removed":true,"created":{"platform":"com.bloombuilt.dayone-mac","version":1344}},"contents":[{"attributes":{"line":{"header":1,"identifier":"F78B28DA-488E-489E-9C95-1A0648099792"}},"text":"2022\\n"},{"attributes":{"line":{"header":0,"identifier":"FA8C6594-F43D-4652-B442-DAF72A379799"}},"text":"\\n"},{"attributes":{"line":{"header":0,"identifier":"0923BCC8-B24A-4C0D-963C-73D09561EECD"}},"text":"It’s the beginning of a new year"},{"embeddedObjects":[{"type":"horizontalRuleLine"}]},{"text":"\\n\\n\\n\\n"},{"embeddedObjects":[{"type":"horizontalRuleLine"}]}]}',
'duration': 0,
'creationOSVersion': '12.1',
'weather': {'sunsetDate': '2022-01-12T16:15:28Z',
'temperatureCelsius': 7,
'weatherServiceName': 'HAMweather',
'windBearing': 230,
'sunriseDate': '2022-01-12T08:00:44Z',
'conditionsDescription': 'Mostly Clear',
'pressureMB': 1042,
'visibilityKM': 48.28020095825195,
'relativeHumidity': 81,
'windSpeedKPH': 6,
'weatherCode': 'clear-night',
'windChillCelsius': 6.699999809265137},
'editingTime': 2925.313938140869,
'timeZone': 'Europe/London',
'creationDeviceType': 'Hal 9000',
'uuid': '988D9D9876624FAEB88F9BCC666FD9CD',
'creationDeviceModel': 'MacBookPro15,2',
'starred': False,
'location': {'region': {'center': {'longitude': -0.0095,
'latitude': 51},
'radius': 75},
'localityName': 'London',
'country': 'United Kingdom',
'timeZoneName': 'Europe/London',
'administrativeArea': 'England',
'longitude': -0.0095,
'placeName': 'Somewhere',
'latitude': 51},
'isPinned': False,
'creationDevice': 'somedevice'...,
}
I only want the 'text' (of which there might be a number of 'text' entries and 'creationDate' so I've got a daily record.
My code to pull out the data is straightforward:
import json
# Opening JSON file
f = open('files/2022.json')
# returns JSON object as
# a dictionary
data = json.load(f)
# Closing file
f.close()
I've tried using list comprensions and then concatenating the Series in Pandas, but two don't match in length - because multiple entries on one day mix up the dataframe.
I wanted to use this code, but:
result = []
for i in data['entries']:
entry = i['creationDate'] + i['text']
result.append(entry)
but I get this error:
KeyError: 'text'
What do I need to do?
Update:
{'richText': '{"meta":{"version":1,"small-lines-removed":true,"created":{"platform":"com.bloombuilt.dayone-mac","version":1344}},"contents":[{"text":"Later than I planned\\n"}]}',
'duration': 0,
'creationOSVersion': '12.1',
'weather': {'sunsetDate': '2022-01-12T16:15:28Z',
'temperatureCelsius': 7,
'weatherServiceName': 'HAMweather',
'windBearing': 230,
'sunriseDate': '2022-01-12T08:00:44Z',
'conditionsDescription': 'Mostly Clear',
'pressureMB': 1042,
'visibilityKM': 48.28020095825195,
'relativeHumidity': 81,
'windSpeedKPH': 6,
'weatherCode': 'clear-night',
'windChillCelsius': 6.699999809265137},
'editingTime': 672.3099998235703,
'timeZone': 'Europe/London',
'creationDeviceType': 'Computer',
'uuid': 'F53DCC5E05BB4106A49C76954117DBF4',
'creationDeviceModel': 'xompurwe',
'isPinned': False,
'creationDevice': 'Computer',
'text': 'Later than I planned \\\n',
'modifiedDate': '2022-01-05T01:01:29Z',
'isAllDay': False,
'creationDate': '2022-01-05T00:39:19Z',
'creationOSName': 'macOS'},
Sort of managed to work a solution - thank you to everyone who helped this morning, particularly #Tomer S.
My solution was:
result = []
for i in data['entries']:
print (i['creationDate'] + i['text'])
result.append(entry)
It still won't get what I want

How to create MultiIndex Dataframe from a nested dictionary (many levels)

I am using the pyflightdata library to search for flight stats. It returns json inside a list of dicts.
Here is an example of the first dictionary in the list after my query:
> flightlog = {'identification': {'number': {'default': 'KE504', 'alternative': 'None'}, 'callsign': 'KAL504', 'codeshare': 'None'}
, 'status': {'live': False, 'text': 'Landed 22:29', 'estimated': 'None', 'ambiguous': False, 'generic': {'status': {'text': 'landed', 'type': 'arrival', 'color': 'green', 'diverted': 'None'}
, 'eventTime': {'utc_millis': 1604611778000, 'utc_date': '20201105', 'utc_time': '2229', 'utc': 1604611778, 'local_millis': 1604615378000, 'local_date': '20201105', 'local_time': '2329', 'local': 1604615378}}}
, 'aircraft': {'model': {'code': 'B77L', 'text': 'Boeing 777-FEZ'}, 'registration': 'HL8075', 'country': {'name': 'South Korea', 'alpha2': 'KR', 'alpha3': 'KOR'}}
, 'airline': {'name': 'Korean Air', 'code': {'iata': 'KE', 'icao': 'KAL'}}
, 'airport': {'origin': {'name': 'London Heathrow Airport', 'code': {'iata': 'LHR', 'icao': 'EGLL'}, 'position': {'latitude': 51.471626, 'longitude': -0.467081, 'country': {'name': 'United Kingdom', 'code': 'GB'}, 'region': {'city': 'London'}}
, 'timezone': {'name': 'Europe/London', 'offset': 0, 'abbr': 'GMT', 'abbrName': 'Greenwich Mean Time', 'isDst': False}}, 'destination': {'name': 'Paris Charles de Gaulle Airport', 'code': {'iata': 'CDG', 'icao': 'LFPG'}, 'position': {'latitude': 49.012516, 'longitude': 2.555752, 'country': {'name': 'France', 'code': 'FR'}, 'region': {'city': 'Paris'}}, 'timezone': {'name': 'Europe/Paris', 'offset': 3600, 'abbr': 'CET', 'abbrName': 'Central European Time', 'isDst': False}}, 'real': 'None'}
, 'time': {'scheduled': {'departure_millis': 1604607300000, 'departure_date': '20201105', 'departure_time': '2115', 'departure': 1604607300, 'arrival_millis': 1604612700000, 'arrival_date': '20201105', 'arrival_time': '2245', 'arrival': 1604612700}, 'real': {'departure_millis': 1604609079000, 'departure_date': '20201105', 'departure_time': '2144', 'departure': 1604609079, 'arrival_millis': 1604611778000, 'arrival_date': '20201105', 'arrival_time': '2229', 'arrival': 1604611778}, 'estimated': {'departure': 'None', 'arrival': 'None'}, 'other': {'eta_millis': 1604611778000, 'eta_date': '20201105', 'eta_time': '2229', 'eta': 1604611778}}}
This dictionary is a huge, multi-nested, json mess and I am struggling to find a way to make it readable. I guess something like this:
identification number default KE504
alternative None
callsign KAL504
codeshare None
status live False
text Landed 22:29
Estimated None
ambiguous False
...
I am trying to turn it into a pandas DataFrame, with mixed results.
In this post it was explained that MultiIndex values have to be tuples, not dictionaries, so I used their example to convert my dictionary:
> flightlog_tuple = {(outerKey, innerKey): values for outerKey, innerDict in flightlog.items() for innerKey, values in innerDict.items()}
Which worked, up to a certain point.
df2 = pd.Series(flightlog_tuple)
gives the following output:
identification number {'default': 'KE504', 'alternative': 'None'}
callsign KAL504
codeshare None
status live False
text Landed 22:29
estimated None
ambiguous False
generic {'status': {'text': 'landed', 'type': 'arrival...
aircraft model {'code': 'B77L', 'text': 'Boeing 777-FEZ'}
registration HL8075
country {'name': 'South Korea', 'alpha2': 'KR', 'alpha...
airline name Korean Air
code {'iata': 'KE', 'icao': 'KAL'}
airport origin {'name': 'London Heathrow Airport', 'code': {'...
destination {'name': 'Paris Charles de Gaulle Airport', 'c...
real None
time scheduled {'departure_millis': 1604607300000, 'departure...
real {'departure_millis': 1604609079000, 'departure...
estimated {'departure': 'None', 'arrival': 'None'}
other {'eta_millis': 1604611778000, 'eta_date': '202...
dtype: object
Kind of what I was going for but some of the indexes are still in the column with values because there are so many levels. So I followed this explanation and tried to add more levels:
level_up = {(level1Key, level2Key, level3Key): values for level1Key, level2Dict in flightlog.items() for level2Key, level3Dict in level2Dict.items() for level3Key, values in level3Dict.items()}
df2 = pd.Series(level_up)
This code gives me AttributeError: 'str' object has no attribute 'items'. I don't understand why the first 2 indexes worked, but the others give an error.
I've tried other methods like MultiIndex.from_tuple or DataFrame.from_dict, but I can't get it to work.
This Dictionary is too complex as a beginner. I don't know what the right approach is. Maybe I am using DataFrames in the wrong way. Maybe there is an easier way to access the data that I am overlooking.
Any help would be much appreciated!

How to extract certain information from a string and create a json object in python

I made a get request to a website and parsed it using BS4 using 'Html.parser'. I want to extract the ID, size and availability from the string. I have parsed it down to this final string:
'{"id":706816278547,"parent_id":81935859731,"available":false,
"sku":"665570057894","featured_image":null,"public_title":null,
"requires_shipping":true,"price":40000,"options":["S"],
"option1":"s","option2":"","option3":"","option4":""},
{"id":707316252691,"parent_id":81935859731,"available":true,
"sku":"665570057900","featured_image":null,"public_title":null,
"requires_shipping":true,"price":40000,"options":["M"],
"option1":"m","option2":"","option3":"", "option4":""},
{"id":707316285459,"parent_id":81935859731,"available":true,
"sku":"665570057917","featured_image":null,"public_title":null,
"requires_shipping":true,"price":40000,"options":["L"],
"option1":"l","option2":"","option3":"","option4":""},`
{"id":707316318227,"parent_id":81935859731,"available":true,`
"sku":"665570057924","featured_image":null,"public_title":null,
"requires_shipping":true,"price":40000,"options":["XL"],
"option1":"xl","option2":"","option3":"","option4":""}'
I also tried using the split() method but I get lost and im unable to extract the needed information without creating a cluttered list and getting lost.
I tried using json.loads() so i could just extract the information needed by calling the key and value pairs but i get the following error
final_id =
'{"id":706816278547,"parent_id":81935859731,"available":false,
"sku":"665570057894","featured_image":null,"public_title":null,
"requires_shipping":true,"price":40000,"options":["S"],
"option1":"s","option2":"","option3":"","option4":""},
{"id":707316252691,"parent_id":81935859731,"available":true,
"sku":"665570057900","featured_image":null,"public_title":null,
"requires_shipping":true,"price":40000,"options":["M"],
"option1":"m","option2":"","option3":"", "option4":""},
{"id":707316285459,"parent_id":81935859731,"available":true,
"sku":"665570057917","featured_image":null,"public_title":null,
"requires_shipping":true,"price":40000,"options":["L"],
"option1":"l","option2":"","option3":"","option4":""},`
{"id":707316318227,"parent_id":81935859731,"available":true,`
"sku":"665570057924","featured_image":null,"public_title":null,
"requires_shipping":true,"price":40000,"options":["XL"],
"option1":"xl","option2":"","option3":"","option4":""}'
find_id = json.loads(final_id)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/anaconda3/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/anaconda3/lib/python3.7/json/decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 233 (char 232)
I want to create a json object for each ID and Size and if that size is available or not.
Any help is welcomed. Thank you.
First thats not a valid json info
second, json.loads works for files, so a file containing this info will solve the issue because null in json equal None in python so json.load you can say translate a json file so python understand it, so
import json
with open('sof.json', 'r') as stackof:
final_id = json.load(stackof)
print(final_id)
will output
[{'id': 706816278547, 'parent_id': 81935859731, 'available': 'false', 'sku': '665570057894', 'featured_image': None, 'public_title': None, 'requires_shipping': True, 'price': 40000, 'options': ['S'], 'option1': 's', 'option2': '', 'option3': '', 'option4': ''}, {'id': 707316252691, 'parent_id': 81935859731, 'available': True, 'sku': '665570057900', 'featured_image': None, 'public_title': None, 'requires_shipping': True, 'price': 40000, 'options': ['M'], 'option1': 'm', 'option2': '', 'option3': '', 'option4': ''}, {'id': 707316285459, 'parent_id': 81935859731, 'available': True, 'sku': '665570057917', 'featured_image': None, 'public_title': None, 'requires_shipping': True, 'price': 40000, 'options': ['L'], 'option1': 'l', 'option2': '', 'option3': '', 'option4': ''}, {'id': 707316318227, 'parent_id': 81935859731, 'available': True, 'sku': '665570057924', 'featured_image': None, 'public_title': None, 'requires_shipping': True, 'price': 40000, 'options': ['XL'], 'option1': 'xl', 'option2': '', 'option3': '', 'option4': ''}]
i made all of them divided into array, so now if you print the first id you should write
print(final_id[0]['id'])
output:
706816278547
Tell me in the comments if that helped you,
btw click on >> sof.json to see sof.json