In my fetched json data, how can I seperate out the balance? - json

So, I have been testing block.io api, and so far I have this:
knee = block_io.get_address_balance(labels='shibe1')
s1 = json.dumps(knee)
d2 = json.loads(s1)
print (d2)
It returns me with this batch of text:
{'status': 'success', 'data': {'network': 'DOGE', 'available_balance': '0.0', 'pending_received_balance': '0.0', 'balances': [{'user_id': 1, 'label': 'shibe1', 'address': 'A9Bda9UMBcb1183PtsBxnbj5QgP6jwkCFG', 'available_balance': '0.00000000', 'pending_received_balance': '0.00000000'}]}}
How would I get it so that I could grab only the available_balance part of it, and print it out instead of all of the json data?
EDIT: Please help! Cant find a solution.

Try using some regex.
import re
data="{'status': 'success', 'data': {'network': 'DOGE', 'available_balance': '0.129',
'pending_received_balance': '0.0', 'balances': [{'user_id': 1, 'label': 'shibe1',
'address': 'A9Bda9UMBcb1183PtsBxnbj5QgP6jwkCFG', 'available_balance': '0.00000000',
'pending_received_balance': '0.00000000'}]}}"
pattern = re.compile("(?<=available_balance': ').*?(?=')")
matches = pattern.finditer(data)
for match in matches:
print(match.group())
Breakdown :
import re imports the regex library built into python
data="{'status': 'success', 'data': {'network': 'DOGE', 'available_balance': '0.129',
'pending_received_balance': '0.0', 'balances': [{'user_id': 1, 'label': 'shibe1',
'address': 'A9Bda9UMBcb1183PtsBxnbj5QgP6jwkCFG', 'available_balance': '0.00000000',
'pending_received_balance': '0.00000000'}]}}" is a string containing the data to match. You can replace this with the json data.
pattern = re.compile("(?<=available_balance': ').*?(?=')") compiles the regex for finding the data for available balance.
Regex breakdown
(?<= is a lookbehind, which means it will check if the value is actually available_balance.
.* matches everything inside a defined constraint.
(?= is a lookahead, which means it will match everything before the close parenthesis, and everything after the lookbehind.
pattern.finditer(data) matches the regex against data
for match in matches:
print(match.group()) prints the matches from the regex.
If you compile this code, you will get the following results :
0.129
0.00000000
If you want the code under your variables, here you go :
import re
pattern = re.compile("(?<=available_balance': ').*?(?=')")
matches = pattern.finditer(d2)
for match in matches:
print(match.group())

Related

How to handle the variable size json file in python to create DataFrame using pandas

I am trying to build a DataFrame using pandas but I am not able to handle the case when I have the variable size of JSON chunks I am getting.
eg:
1st chunk:
{'ad': 0,
'country': 'US',
'ver': '1.0',
'adIdType': 2,
'adValue': '5',
'data': {'eventId': 99,
'clickId': '',
'eventType': 'PURCHASEMADE',
'tms': '2019-12-25T09:57:04+0000',
'productDetails': {'currency': 'DLR',
'productList': [
{'segment': 'Girls',
'vertical': 'Fashion Jewellery',
'brickname': 'Traditional Jewellery',
'price': 8,
'quantity': 10}]},
'transactionId': '1254'},
'appName': 'xer.tt',
'appId': 'XR',
'sdkVer': '1.0.0',
'language': 'en',
'tms': '2022-04-25T09:57:04+0000',
'tid': '124'}
2nd chunk:
{'ad': 0,
'country': 'US',
'ver': '1.0',
'adIdType': 2,
'adValue': '78',
'data': {'eventId': 7,
'clickId': '',
'eventType': 'PURCHASEMADE',
'tms': '20219-02-25T09:57:04+0000',
'productDetails': {'currency': 'DLR',
'productList': [{'segment': 'Boys',
'vertical': 'Fashion',
'brickname': 'Casuals',
'price': 10,
'quantity': 5},
{'segment': 'Girls',
'vertical': 'Fashion Jewellery',
'brickname': 'Traditional Jewellery',
'price': 8,
'quantity': 10}]},
'transactionId': '3258'},
'appName': 'xer.tt',
'appId': 'XR',
'sdkVer': '1.0.0',
'language': 'en',
'tms': '2029-02-25T09:57:04+0000',
'tid': '124'}
Now in the ProductDetails the number of products are getting changes, in the first chunk we have only 1 product listed and it's detailed but in the 2nd chunk, we have 2 products listed and it's detailed, for further chunks we can have ANY number of products for other chunks also. (i.e. chunks~Records)
I tried doing that by writing some python scripts but was not able to come to any good solution.
PS: If any further detail is required please let me know in the comments.
Thanks!
What you can do, is use pd.json_normalize and have the most "inner" dictionary as your record_path and all other data you are interested in as your meta . Here is an in-depth example how you could construct that: pandas.io.json.json_normalize with very nested json
In your case, that would for example be (for a single object):
df = pd.json_normalize(obj,
record_path=["data", "productDetails", "productList"],
meta=([
["data", "productDetails", "currency"],
["data", "transactionId"],
["data", "clickId"],
["data", "eventType"],
["data", "tms"],
"ad",
"country"
])
)

converting json into pandas dataframe

I have JSON output that I would like to convert to pandas dataframe. I downloaded from a website via HTTPS and utilizing an API key. thanks much. here is what I coded:
json_data = vehicle_miles_traveled.json()
print(json_data)
{'request': {'command': 'series', 'series_id': 'STEO.MVVMPUS.A'}, 'series': [{'series_id': 'STEO.MVVMPUS.A', 'name': 'Vehicle Miles Traveled, Annual', 'units': 'million miles/day', 'f': 'A', 'description': 'Includes gasoline and diesel fuel vehicles', 'copyright': 'None', 'source': 'U.S. Energy Information Administration (EIA) - Short Term Energy Outlook', 'geography': 'USA', 'start': '1990', 'end': '2023', 'lastHistoricalPeriod': '2021', 'updated': '2022-03-08T12:39:35-0500', 'data': [['2023', 9247.0281671], ['2022', 9092.4575671], ['2021', 8846.1232877], ['2020', 7933.3907104], ['2019', 8936.3589041], ['2018', 8877.6027397], ['2017', 8800.9479452], ['2016', 8673.2431694], ['2015', 8480.4712329], ['2014', 8289.4684932], ['2013', 8187.0712329], ['2012', 8110.8387978], ['2011', 8083.2931507], ['2010', 8129.4958904], ['2009', 8100.7205479], ['2008', 8124.3387978], ['2007', 8300.8794521], ['2006', 8257.8520548], ['2005', 8190.2136986], ['2004', 8100.5163934], ['2003', 7918.4136986], ['2002', 7823.3123288], ['2001', 7659.2054795], ['2000', 7505.2622951], ['1999', 7340.9808219], ['1998', 7192.7780822], ['1997', 7014.7205479], ['1996', 6781.9699454], ['1995', 6637.7369863], ['1994', 6459.1452055], ['1993', 6292.3424658], ['1992', 6139.7595628], ['1991', 5951.2712329], ['1990', 5883.5643836]]}]}
It hugely depends on your final goal. You could add all meta-data in a dataframe if you want to. I assume that you are interested in reading the data field into a dataframe.
We can just get those fields by accessing:
data = json_data['series'][0]['data']
# and pass them to the dataframe constructor. We can specify the column names as well!
df = pd.DataFrame(data, columns=['year', 'other_col_name'])

can't convert text data to json

I am trying to convert the following (json) string into a python data type:
data = "{'id': 26, 'photo': '/media/f082b5af-ad0.png', 'first_name': 'Islam', 'last_name': 'Mansour', 'email': 'islammansour06+8#gmail.com', 'city': 'Giza', 'cv': '/media/fbb61609-442.pdf', 'reference': 'Facebook', 'campaign': OrderedDict([('id', 2), ('name', 'javascript')]), 'status': 'Invitation Sent', 'user': None, 'at': '2020-01-20', 'time': '23:02:58.359179', 'technologies': [OrderedDict([('id', 46), ('name', 'Django'), ('category', OrderedDict([('id', 24), ('name', 'Framework'), ('_type', 'skill')]))])]}"
I am trying to convert it to JSON by using
json.loads(data.replace("\'", "\""))
but I am having the following error
json.decoder.JSONDecoderError: Expecting value: line 1 column 219 (char 218)
The issue is that your data is not valid json.
The main problem starts here: [OrderedDict([('id', 46), ('name', 'Django'), ('category', OrderedDict([('id', 24), ('name', 'Framework'), ('_type', 'skill')]))])]}. This looks like it is a string representaion of some python objects.
Below is a more friendly representation of your json data.
I have marked the problematic parts (with **) (basically everywhere there is a OrderedDict).
{
"id":26,
"photo":"/media/f082b5af-ad0.png",
"first_name":"Islam",
"last_name":"Mansour",
"email":"islammansour06+8#gmail.com",
"city":"Giza",
"cv":"/media/fbb61609-442.pdf",
"reference":"Facebook",
"campaign":**OrderedDict**([("id",
2), ("name", "javascript")]), "status":"Invitation Sent",
"user":None,
"at":"2020-01-20",
"time":"23:02:58.359179",
"technologies":[
**OrderedDict**([("id",
46),
("name",
"Django")
]("category", OrderedDict([("id", 24), ("name", "Framework"), ("_type", "skill")]))])]
}```
You could try making use of an [online json parser][1] which might give you some friendlier output.
[1]: http://json.parser.online.fr/
As previously said, OrderedDict is not correct JSON. But this is correct python.
To fix it:
from collections import OrderedDict # direct import because this is as this in your string
import json
jsonCorrect = json.dumps(eval(data))
json.loads(jsonCorrect) # it works
Not sure why you are adding the replace call. Should work with just the following:
json.loads(data)
You can read about it here.

Why does dask.bag.read_text(filename).map(json.loads) return a list?

I need to read several json.gz files using Dask. I am trying to achieve this by using dask.bag.read_text(filename).map(json.loads), but the output is a nested list (the files contain lists of dictionaries), whereas I would like to get a just a list of dictionaries.
I have included a small example that reproduces my problem, below.
import json
import gzip
import dask.bag as db
dict_list = [{'id': 123, 'name': 'lemurt', 'indices': [1,10]}, {'id': 345, 'name': 'katin', 'indices': [2,11]}]
filename = './test.json.gz'
# Write json
with gzip.open(filename, 'wt') as write_file:
json.dump(dict_list , write_file)
# Read json
with gzip.open(filename, "r") as read_file:
data = json.load(read_file)
# Read json with Dask
data_dask = db.read_text(filename).map(json.loads).compute()
print(data)
print(data_dask)
I would like to get the first output:
[{'id': 123, 'name': 'lemurt', 'indices': [1, 10]}, {'id': 345, 'name': 'katin', 'indices': [2, 11]}]
But instead I get the second one:
[[{'id': 123, 'name': 'lemurt', 'indices': [1, 10]}, {'id': 345, 'name': 'katin', 'indices': [2, 11]}]]
The read_text function returns a bag, where each element is a line of text. So you have a list of strings. Then, you parse each of those lines of text with json.loads, so each of those lines of text becomes a list again. So you have a list of lists.
In your case you might use map_partitions, and a function that expects a list of a single line of text
b = db.read_text("*.json.gz").map(lambda L: json.loads(L[0]))
Following the comment by #MRocklin, I ended up solving my problem by changing the way I was writing the json.gz files.
Instead of
with gzip.open(filename, 'wt') as write_file:
json.dump(dict_list , write_file)
I used
with gzip.open(filename, 'wt') as write_file:
for dd in dict_list:
json.dump(dd , write_file)
write_file.write("\n")
and kept reading the files as
db.read_text(filename).map(json.loads)

Simple Json decoding with SimpleJSON - Python

Ive just started learning python and Im having a go at using a google api. But I hit a brick wall trying to parse the JSON with simplejson.
How do I go about pulling single values (ie product or brand fields) out of this mess below
{'currentItemCount': 25, 'etag': '"izYJutfqR9tRDg1H4X3fGx1UiCI/hqqZ6pMwV1-CEu5NSqfJO0Ix-gs"', 'id': 'tag:google.com,2010:shopping/products', 'items': [{'id': 'tag:google.com,2010:shopping/products/1196682/8186421160532506003',
'kind': 'shopping#product',
'product': {'author': {'accountId': '1196682',
'name': "Dillard's"},
'brand': 'Merrell',
'condition': 'new',
'country': 'US',
'creationTime': '2011-03-10T08:11:08.000Z',
'description': u'Merrell\'s "Trail Glove" barefoot running shoe lets your feet follow their natural i$
'googleId': '8186421160532506003',
'gtin': '00797240569847',
'images': [{'link': 'http://dimg.dillards.com/is/image/DillardsZoom/03528718_zi_amazon?$product$'}],
'inventories': [{'availability': 'inStock',
'channel': 'online',
'currency': 'USD',
'price': 110.0}],
'language': 'en',
'link': 'http://www.dillards.com/product/Merrell-Mens-Trail-Glove-Barefoot-Running-Shoes_301_-1_301_5$
'modificationTime': '2011-05-25T07:42:51.000Z',
'title': 'Merrell Men\'s "Trail Glove" Barefoot Running Shoes'},
'selfLink': 'https://www.googleapis.com/shopping/search/v1/public/products/1196682/gid/8186421160532506003?alt=js$
The JSON you've pasted in the question is not valid. But when you fixed that here's how to use simplejson:
import simplejson as json
your_response_body = '["foo", {"bar":["baz", null, 1.0, 2]}]'
obj = json.loads(your_response_body)
print(obj[1]['bar'])
And a link to the documentation.