Converting a JSON Dictionary (currently a String) to a Pandas Dataframe - json

I am using Python's request library to retrieve data from a web API. When viewing my data using requests.text, it returns a string of a large JSON object, e.g.,
'{"Pennsylvania": {"Sales": [{"date":"2021-12-01", "id": "Metric67", ... '
Naturally, the type of this object is currently a string. What is the best way to cover this string/JSON to a Pandas Dataframe?

r.text returns json as text.
You can use r.json to get json as dictionary from requests:
import requests
r=requests.get(YOUR_URL)
res=r.json()

Related

Nested JSON to dataframe in Scala

I am using Spark/Scala to make an API Request and parse the response into a dataframe. Following is the sample JSON response I am using for testing purpose:
API Request/Response
However, I tried to use the following answer from StackOverflow to convert to JSON but the nested fields are not being processed. Is there any way to convert the JSON string to a dataframe with columns??
I think the problem is that the json that you have attached, if we read it as a df, it is giving a single row(and it is very huge) and hence spark might be truncating the result.
If this is what you want then you can try to use the spark property spark.debug.maxToStringFields to a higher value(default is 25)
spark.conf().set("spark.debug.maxToStringFields", 100)
However, if you want to process the Results from json, then it would be better to get it as data frame and then do the processing. Here is how you can do it
val results = JsonParser.parseString(<json content>).getAsJsonObject().get("Results").getAsJsonArray.toString
import spark.implicits._
val df = spark.read.json(Seq(results).toDS)
df.show(false)

Storing data, originally in a dictionary sequence to a dataframe in the json format of a webpage

I'm new to pandas. How to store data, originally in a dictionary sequence to a DataFrame in the json format of a webpage?
I am interpreting the question keeping in mind that you have the url of the webpage you want to read. Inspect that url and check if the data needed, is available in the json format. If present, an url will be provided containing all the data. We need that url in the following code:
First, import the pandas module.
import pandas as pd
import requests
import json
URL="url of the webpage having the json file"
r=requests.get(URL)
data= r.json()
Create the dataframe df.
df=pd.io.json.json_normalize(data)
Print the dataframe to check whether you have received the required one.
print(df)
I hope this answers your question.

write json with format in pyspark

I want to write a data frame as a json dataframe in pyspark replicating this way to write json from pandas:
df.to_json(orient='columns')
then I got
'{"col 1":{"row 1":"a","row 2":"c"},"col 2":{"row 1":"b","row 2":"d"}}'
But when I use this in AWS GLUE
df.write.mode('overwrite').json(path)
I got this format:
df.to_json(orient='records')
'[{"col 1":"a","col 2":"b"},{"col 1":"c","col 2":"d"
I was finding out about parameters of json in pyspark and there is not orient to set the json format

Attaining the data in json format from the payload which is available as "org.mule.munit.common.util.ReusableByteArrayInputStream#53534c15" in mule 3

I need the real payload json data to be able to assert it against another hardcoded json file in munit (mule 3.9 and dataweave 1). The issue is the payload show as "org.mule.munit.common.util.ReusableByteArrayInputStream#53534c15" under payload. When I convert it to java I can see the data, but not in json format. How can I extract the json in this byte array stream to be able to assert it against a json hardcoded file.
I resolved it by using the "Byte to String" block
Then, I have added the "Assert Equals" block, but made sure to format both values likes this.
#[payload.replaceAll("\\s+","")]
#[getResource('sample.json').asString().replaceAll("\\s+","")]
This did exactly what I needed.

Create JSON file from MongoDB document using Python

I am using MongoDB 3.4 and Python 2.7. I have retrieved a document from the database and I can print it and the structure indicates it is a Python dictionary. I would like to write out the content of this document as a JSON file. When I create a simple dictionary like d = {"one": 1, "two": 2} I can then write it to a file using json.dump(d, open("text.txt", 'w'))
However, if I replace d in the above code with the the document I retrieve from MongoDB I get the error
ObjectId is not JSON serializable
Suggestions?
As you have found out, the issue is that the value of _id is in ObjectId.
The class definition for ObjectId is not understood by the default json encoder to be serialised. You should be getting similar error for ANY Python object that is not understood by the default JSONEncoder.
One alternative is to write your own custom encoder to serialise ObjectId. However, you should avoid inventing the wheel and use the provided PyMongo/bson utility method bson.json_util
For example:
from bson import json_util
import json
json.dump(json_util.dumps(d), open("text.json", "w"))
The issue is that “_id” is actually an object and not natively deserialized. By replacing the _id with a string as in mydocument['_id'] ='123 fixed the issue.