Spark dataframe write to JSON showing NULL ourput - json

I am extracting JSON data from a API and trying to write on Azure container path. I am able to display data correctly in notebook, but when i write JSON most of the values are NULL. Any help on where i am going wrong?
headers = {
"accept" : "application/json",
"Content-Type": "application/json",
"Authorization": "Bearer " + str(token)
}
response_get= requests.get(getURL, headers=headers)
response_final=response_get.json()
print("Type:", type(response_final))
data = json_normalize(response_final)
df = spark.createDataFrame(data)
##df.coalesce(1).write.parquet(stagingpath,mode='overwrite')
df.coalesce(1).write.json(stagingpath,mode='overwrite')

I have reproduced in my environment and followed below process and got expected results as below and followed Microsoft-Document and SO-Thread:
import requests
response = requests.get('https://reqres.in/api/users?page=3')
rdd = spark.sparkContext.parallelize([response.text])
df = spark.read.json(rdd)
df.show()
dbutils.fs.mount( source = "wasbs://mycontainer#myblobstorageaccount.blob.core.windows.net", mount_point = "/mnt/mymountpoint", extra_configs = {"fs.azure.sas.mycontainer.myblobstorageaccount.blob.core.windows.net": "SAS"})
The run below script to write json:
df.coalesce(1).write.json( "/mnt/mymountpoint/vamo.json")
Output:
Click on folder Vammo.json:
Click on part-00xxx:
Then Click on View/Edit:

Related

Python Request Post Loop through set of Json

I am trying to build a script that will take each json object I have and execute a requests.post successfully until it is finished. The problem I feel I may be having is running a successful loop that handles that task for me. It has been awhile since I coded some python so any insights will be helpful. Below is my data and code sets
new_df = {
"id": 1,
"name": "memeone",
"smartphoneWidth": 0,
"isHtmlCompatible": true,
"instancesPerPage": 1,
"isArchived": false
}, {
"id": 1,
"name": "memetwo",
"smartphoneWidth": 0,
"isHtmlCompatible": true,
"instancesPerPage": 1,
"isArchived": false
}
I realize it is not in an a list or within brackets [], but this is the only I can get the data to successfully post in my experience. Do I need to put it in a dataframe?
Below is my code that I am using -
test_df = pd.read_csv('dummy_format_no_id_v4.csv').rename_axis('id')
test_two_df = test_df.reset_index().to_json(orient='records')
test_three_df = test_two_df[1:-1]
new_df = test_three_df
for item in new_df:
try:
username = 'username'
password = 'password'
headers = {"Content-Type": "application/json; charset=UTF-8"}
response = requests.post('https://api.someurl.com/1234/thispath', data=new_df, headers=headers, auth=(username, password))
print(response.text)
except:
print('ERROR')
the issue here is it will post first json object ("name": "memeone") successfully, but won't post the next one ("name":"memetwo")? How can I get it to iterate and also post the next json object? Must it be in a dataframe?
Thank you for any advice in advance. Apologies if my code is bad.
Actually, package requests itself has a json parameter that has been provided, and you can use that. instead of using data or saving it in dataframe, you can use like this :
for item in new_df:
try:
headers = {"Content-Type": "application/json; charset=UTF-8"}
response = requests.post(
f"https://httpbin.org/anything/{new_df}", json=new_df, headers=headers
)
# put break statement to terminate the loop
print(response.text)
break
except Exception as e:
raise Exception(f"Uncaught exception error {e}")

How to convert a fetch return object to csv from a fetch API in C3.ai COVID-19 datalake?

I am making some fetch API calls to the C3.ai COVID-19 datalake. How best can I convert that to a csv for easier reading? For reference, I am running the sample code below:
import requests, json
url = "https://api.c3.ai/covid/api/1/outbreaklocation/fetch/"
request_data = {
"spec": {
"include": "id,name,population2018",
"limit": 500
}
}
headers = {
"Accept": "application/json",
"Content-Type": "application/json"
}
response = requests.post(url=url, json=request_data, headers=headers)
fetch_object = json.loads(response.text)
fetch_object is now a python dict. But I would like to convert it to a csv. How do I do that generically? I could fetch one or more fields, as specified in the include field in the spec argument.
def convert_fetchResult_to_Pandas(fetch_object, required_fields):
fetch_objs = fetch_result["objs"]
df = pd.read_json(json.dumps(fetch_objs))
return df[required_fields]
One can then call:
df = convert_fetchResult_to_Pandas(fetch_object, ["id,name,population2018"])
csv_string = df.to_csv()
Using pd.json_normalize(fetch_object['objs']) instead of pd.read_json(json.dumps(fetch_object['objs'])) may be worth considering, too, depending on what you're after. It'll flatten any nested dicts in the case objects and separate the variable levels in the column names with dots.
Consider trying the open source repo c3covid19. You can find the docs here. It is a non-official c3 covid19 data lake connection wrapper for python.
Install
pip install c3covid19
Run
from c3covid19 import c3api
cnx=c3api()
request_data = {
"spec": {
"include": "id,name,population2018",
"limit": 500
}
}
output=cnx.request(
data_type='outbreaklocation',
parameters=request_data,
api='fetch',
output_type='csv',
outfile='./output'
)

Json Data Request from Web using Python

I am new in python. Now I am studying web scraping using request but I am stuck in get data table in json format on link below
http://hyd-app.rid.go.th/hydro3d.html
Anyone can help
Thanks
If you right click then inspect in Chrome or Firefox you can see under the network tab a XHR request is made to a different URL. You can access the URL directly like this (just update the "data" payload to what you want)
import requests
import json
url = "http://hyd-app.rid.go.th/webservice/getDailyWaterLevelListReport.ashx?option=2"
headers = {'Accept': 'application/json'}
data = {"DW[UtokID]": "3",
"DW[TimeCurrent]": "23/10/2561",
"_search": "false",
"nd": "1540287188732",
"rows": "1000",
"page": "1",
"sidx": "indexcount",
"sord": "asc"}
r = requests.post(url, data = data, headers = headers)
with open('data.json', 'w') as outfile:
json.dump(r.json(), outfile)
I have dumped it to a file as my terminal will not display the json with the encoding it uses.

Call an API with python

I used Postman to get a bit more info and have used the snippet of code to run the request it is working now but i still want the request to print in JSON not in text im really stuck.....
import requests
url = "https://ice3x.com/api/v1/stats/marketdepthfull/"
headers = {
'cache-control': "no-cache",
'postman-token': "afb97efe-aaf1-cdae-004f-29aac565780a"
}
response = requests.request("GET", url, headers=headers)
print(response.text)
The result i get is:-
{"errors":false,"response":{"entities":[{"pair_id":"3","min":"56600.00000000","max":"59600.00000000","avg":"58547.48550813","vol":"5.68526617","pair_name":"btc/zar","last_price":"58547.00000000"},{"pair_id":"4","min":"0.00000000","max":"0.00000000","avg":"0.00000000","vol":"0.00000000","pair_name":"btc/ngn","last_price":"1560000.00000000"},{"pair_id":"6","min":"732.00000000","max":"781.00000000","avg":"765.97099358","vol":"160.77519562","pair_name":"ltc/zar","last_price":"760.00000000"},{"pair_id":"11","min":"4050.00000000","max":"4349.00000000","avg":"4253.33493584","vol":"74.22964491","pair_name":"eth/zar","last_price":"4230.00000000"},{"pair_id":"12","min":"0.00000000","max":"0.00000000","avg":"0.00000000","vol":"0.00000000","pair_name":"eth/ngn","last_price":"110000.00000000"}]},"pagination":{"items_per_page":1000,"total_items":5,"current_page":1,"total_pages":1}}
Process finished with exit code 0
What i need is to be able to dump the json and call it again if i need it
Please help

Erlang Chicagoboss unable to get the correct JSON response

In my controller file I have a method that reads the incoming HTTP request, reads the user data from the Database, encodes the result in JSON (using jsx) and sends it in response.
sensorusersdetails('GET', []) ->
Headers = [{'Access-Control-Allow-Origin', "*"},
{'Access-Control-Allow-Methods', "GET, OPTIONS"},
{'Content-Type', "application/json"},
{'Access-Control-Allow-Headers', "X-Requested-With"},
{'Access-Control-Max-Age', "180"}],
Building = Req:query_param("bld"),
io:format("User Data request from Node.js server~n~p~n",
[Req:query_params()]),
{{Year,Month,Day},{_,_,_}} = erlang:localtime(),
StrDate = lists:flatten(io_lib:format("~4..0w-~2..0w-~2..0w",
[Year,Month,Day])),
BUserDataList = boss_db:find(sensoruser_data, [{building, 'equals', Building}]),
io:format("Current Users Data stored in the database: ~n~p~n",[BUserDataList]),
MyUserJSONList = sensor_preapre_data(BUserDataList, StrDate),
io:format("The Present Date Sensor Users Data with Binary 1: ~n~p~n",[MyUserJSONList]),
MyUserJSONListLength = length(MyUserJSONList),
if MyUserJSONListLength > 0 ->
MyFinalList = sensor_data_final(MyUserJSONList),
io:format("The Present Date Sensor Users Data without Binary 2: ~n~p~n",[MyFinalList]),
{200, [MyFinalList], Headers};
%%{json, MyFinalList};
true ->
{200, "NO DATA FOUND", Headers}
%%{json, [{error, "NO DATA FOUND"}]}
end.
In the Chicagoboss Server logs I'm getting:
The Present Date Sensor Users Data with Binary 1:
[[<<"{\"username\":\"KPBatman1\",\"building\":\"A\",\"device\":\"Fitbit\",\"date\":\"2017-07-23\",\"calorie\":732,\"distance\":6.4399999999999995,\"elevation\":0,\"floor\":0,\"steps\":8}">>],
[<<"{\"username\":\"KPSuperman1\",\"building\":\"A\",\"device\":\"Jawbone\",\"date\":\"2017-07-23\",\"calorie\":0,\"distance\":0.0,\"elevation\":0,\"floor\":0,\"steps\":0}">>]]
The Present Date Sensor Users Data without Binary 2:
[["{\"username\":\"KPBatman1\",\"building\":\"A\",\"device\":\"Fitbit\",\"date\":\"2017-07-23\",\"calorie\":732,\"distance\":6.4399999999999995,\"elevation\":0,\"floor\":0,\"steps\":8}"],
["{\"username\":\"KPSuperman1\",\"building\":\"A\",\"device\":\"Jawbone\",\"date\":\"2017-07-23\",\"calorie\":0,\"distance\":0.0,\"elevation\":0,\"floor\":0,\"steps\":0}"]]
However, when I send the HTTP request - the JSON response I am getting:
{"username":"KPBatman1","building":"A","device":"Fitbit","date":"2017-07-23","calorie":732,"distance":6.4399999999999995,"elevation":0,"floor":0,"steps":8}
{"username":"KPSuperman1","building":"A","device":"Jawbone","date":"2017-07-23","calorie":0,"distance":0.0,"elevation":0,"floor":0,"steps":0}
What is the correct way to send JSON response?
However, when I send the HTTP request - the JSON response I am
getting:
{"username":"KPBatman1","building":"A", ...}
{"username":"KPSuperman1","building":"A", ...}
And? What did you expect/want to get?
The following code works for me because the output is what I expected to see:
-module(cb_tutorial_greeting_controller, [Req]).
-compile(export_all).
hello('GET', []) ->
Headers = [
{'Access-Control-Allow-Origin', "*"},
{'Access-Control-Allow-Methods', "GET, OPTIONS"},
{'Content-Type', "application/json"},
{'Access-Control-Allow-Headers', "X-Requested-With"},
{'Access-Control-Max-Age', "180"}
],
Data = [
[<<"{\"username\":\"KPBatman1\",\"building\":\"A\"}">>],
[<<"{\"username\":\"KPSuperman1\",\"building\":\"A\"}">>]
],
Json = jsx:encode(Data),
{200, Json, Headers}.
In my browser, I see:
[["{\"username\":\"KPBatman1\",\"building\":\"A\"}"],["{\"username\":\"KPSuperman1\",\"building\":\"A\"}"]]
Note that MyFinalList isn't even valid JSON:
13> Data = [["{\"a\":\"Batman\"}"], ["{\"b\":\"Superman\"}"]].
[["{\"a\":\"Batman\"}"],["{\"b\":\"Superman\"}"]]
14> jsx:is_json(Data).
false
See what I did there?