Is there a way to parse json into the desired output in pandas? - json

I am stuck in my code. I am parsing 'Json' using pandas but while transforming there is one column that I am stuck with. This column below is "json" inside a list. I want to transform it to 4 columns and the the corresponding data in the rows. The column name is "TypeValues". If I use, pd.concat to transform, I get the error message: "DataFrame constructor not properly called!". I tried using the "pd.DataFrame(eval(data))" but that gives me " eval() arg 1 must be a string, bytes or code object".
There is another column ID. For each ID, i have this below type of data in the "TypeValues" column. I want to get the transformed data with each ID. Does someone have an idea, how can I achieve this? (moreover, some of the rows of the column "TypeValues" have entries [] & blanks)
The column is in the comments section as it didn't allow me to post directly
Thanks in advance
import numpy as np
import pandas as pd
import json from pathlib
import Path from pandas.io.json
import json_normalize import ast
#Reading the File data = pd.read_csv('Test.csv')
Data1 = (pd.concat({k: pd.DataFrame(v) for k, v in data['TypeValues'].pop('TypeValues').items()})).reset_index(level=1, drop=True)
This returns the error of incorrectly using the Dataframe

Related

How to use empyrical ortogonal funcion (EOF) to process my data

I would like to use the Empirical Orthogonal Function in my lat/long/time/temperature dataset. First problem I face is to convert my .csv data into .nc (I need to obtain a three dimension but I failed).
In the follow my code and what I get:
import pandas as pd
import xarray
new_df =df[['TIME','LAT','LONG','Temperat']].copy()
print("DataFrame Shape:",new_df.shape)
display(new_df.head(5))
xr = xarray.Dataset.from_dataframe(new_df)
xr.to_netcdf('test.nc')
image of the dataset

python api json dict in dataframe

I want to scrape data at the county level from https://apidocs.covidactnow.org
However I could only get a dataframe with one line for each county, and data for each date is stored within a dictionary in each row/county. I would like to access this data and store it in long format (= have one row per county-date).
import requests
import pandas as pd
import os
if __name__ == '__main__':
os.chdir('/home/username/Desktop/')
url = 'https://api.covidactnow.org/v2/counties.timeseries.json?apiKey=ENTER_YOUR_KEY'
response = requests.get(url).json()
data = pd.DataFrame(response)
This seems like a trivial question, but I've tried for hours. What would be the best way to achieve that ?
Do you mean something like that?
import requests
url = 'https://api.covidactnow.org/v2/states.timeseries.csv?apiKey=YOURAPIKEY'
response = requests.get(url)
csv_response = (response.text)
# Then you can transform STRING to CSV
Check this fo string to CSV --> python parsing string to csv format

How to navigate through a json file with Python 3? TypeError: list indices must be integers or slices, not str

I am trying to get as many profile links as I can on khanacademy.org. I am using their api.
I am struggling navigating through the json file to get the desired data.
Here is my code :
from urllib.request import urlopen
import json
with urlopen("https://www.khanacademy.org/api/internal/discussions/video/what-are-algorithms/questions?casing=camel&limit=10&page=0&sort=1&lang=en&_=190422-1711-072ca2269550_1556031278137") as response:
source = response.read()
data= json.loads(source)
for item in data['feedback']:
print(item['authorKaid'])
profile_answers = item['answers']['authorKaid']
print(profile_answers)
My goal is to get as many authorKaid as possible en then store them (to create a database later).
When I run this code I get this error :
TypeError: list indices must be integers or slices, not str
I don't understand why, on this tutorial video : https://www.youtube.com/watch?v=9N6a-VLBa2I at 16:10 it is working.
the issue is item['answers'] are lists and you are trying to access by a string rather than an index value. So when you try to get item['answers']['authorKaid'] there is the error:
What you really want is
print (item['answers'][0]['authorKaid'])
print (item['answers'][1]['authorKaid'])
print (item['answers'][2]['authorKaid'])
etc...
So you're actually wanting to iterate through those lists. Try this:
from urllib.request import urlopen
import json
with urlopen("https://www.khanacademy.org/api/internal/discussions/video/what-are-algorithms/questions?casing=camel&limit=10&page=0&sort=1&lang=en&_=190422-1711-072ca2269550_1556031278137") as response:
source = response.read()
data= json.loads(source)
for item in data['feedback']:
print(item['authorKaid'])
for each in item['answers']:
profile_answers = each['authorKaid']
print(profile_answers)

Python: Build DataFrame from parts of JSON response

I am trying to develop an application to retrieve stock prices (in JSON) and then do some analysis on them. My problem is with getting the JSON response into a pandas DataFrame where I can work. Here is my code:
'''
References
http://stackoverflow.com/questions/6862770/python-3-let-json-object- accept-bytes-or-let-urlopen-output-strings
'''
import json
import pandas as pd
from urllib.request import urlopen
#set API call
url = "https://www.quandl.com/api/v3/datasets/WIKI/AAPL.json?start_date=2017-01-01&end_date=2017-01-31"
#make call and receive response
response = urlopen(url).read().decode('utf8')
dataresponse = json.loads(response)
#check incoming
#print(dataresponse)
df = pd.read_json(dataresponse)
print(df)
The application errors at df = pd.read_json... with error TypeError: Expected String or Unicode.
So I reckon this is the first hurdle.
The second is getting where I need to. The JSON response contains only two arrays I am interested in, column_names and data. How do I extract only these two and put into a pandas DataFrame?
To answer your first question, pd.read_json takes a JSON string directly, so you should be doing this:
pd.read_json(response)
But instead, considering how the data is structured, it's best to first convert the JSON string to a dictionary containing the data:
d = json.loads(response)
Then simply build the dataframe from d['dataset']['data'] and d['dataset']['column_names']:
pd.DataFrame(data=d['dataset']['data'], columns=d['dataset']['column_names'])

Unable to retrieve value of JSON key using python

I am trying to import JSON data from an URL and extract the value of a specific key using python 2.7. I tried the following:
import urllib
import json
daily_stock = urllib.urlopen('http://www.bloomberg.com/markets/api/bulk-time-series/price/NFLX%3AUS?timeFrame=1_DAY')
stock_json = json.load(daily_stock)
print stock_json
The output is:
[{u'lastPrice': 95.9, u'lastUpdateDate': u'2016-04-22', u'price': [{u'value': 95.45, u'dateTime': u'2016-04-22T13:30:00Z'} ...
u'dateTimeRanges': {u'start': u'2016-04-22T13:30:00Z', u'end': u'2016-04-22T20:30:00Z'}}]
When i try to retrieve the value of 'lastPrice':
print stock_json["lastPrice"]
I get the following error:
TypeError: list indices must be integers, not str
Please help.
stock_json is a list with a single dictionary inside, get the dictionary by index:
print stock_json[0]["lastPrice"]