write json with format in pyspark - json

I want to write a data frame as a json dataframe in pyspark replicating this way to write json from pandas:
df.to_json(orient='columns')
then I got
'{"col 1":{"row 1":"a","row 2":"c"},"col 2":{"row 1":"b","row 2":"d"}}'
But when I use this in AWS GLUE
df.write.mode('overwrite').json(path)
I got this format:
df.to_json(orient='records')
'[{"col 1":"a","col 2":"b"},{"col 1":"c","col 2":"d"
I was finding out about parameters of json in pyspark and there is not orient to set the json format

Related

Nested JSON to dataframe in Scala

I am using Spark/Scala to make an API Request and parse the response into a dataframe. Following is the sample JSON response I am using for testing purpose:
API Request/Response
However, I tried to use the following answer from StackOverflow to convert to JSON but the nested fields are not being processed. Is there any way to convert the JSON string to a dataframe with columns??
I think the problem is that the json that you have attached, if we read it as a df, it is giving a single row(and it is very huge) and hence spark might be truncating the result.
If this is what you want then you can try to use the spark property spark.debug.maxToStringFields to a higher value(default is 25)
spark.conf().set("spark.debug.maxToStringFields", 100)
However, if you want to process the Results from json, then it would be better to get it as data frame and then do the processing. Here is how you can do it
val results = JsonParser.parseString(<json content>).getAsJsonObject().get("Results").getAsJsonArray.toString
import spark.implicits._
val df = spark.read.json(Seq(results).toDS)
df.show(false)

Converting a JSON Dictionary (currently a String) to a Pandas Dataframe

I am using Python's request library to retrieve data from a web API. When viewing my data using requests.text, it returns a string of a large JSON object, e.g.,
'{"Pennsylvania": {"Sales": [{"date":"2021-12-01", "id": "Metric67", ... '
Naturally, the type of this object is currently a string. What is the best way to cover this string/JSON to a Pandas Dataframe?
r.text returns json as text.
You can use r.json to get json as dictionary from requests:
import requests
r=requests.get(YOUR_URL)
res=r.json()

JSON to DataFrame conversion

Is pd.json_normalize() and pd.DataFrame.from_dict() same and produce similar results?
I am using this to create a dataframe from a JSON Dictionary. Please help

Mantain columns order while writing flat dataframe into json in databricks using pyspark

Hi have to write a flat DF to a json file using pyspark on Databricks.
The dataframe has the columns in this order:
identifier|plant| customer| product|quantity| deadline|priority
I used the following function:
df.coalesce(1).write.format('json').save('path')
But the json file I get has columns in alphabetical order.
Does anyone know how to keep the original sequence of columns?

how to convert nested json file into csv in scala

I want to convert my nested json into csv ,i used
df.write.format("com.databricks.spark.csv").option("header", "true").save("mydata.csv")
But it can use to normal json but not nested json. Anyway that I can convert my nested json to csv?help will be appreciated,Thanks!
When you ask Spark to convert a JSON structure to a CSV, Spark can only map the first level of the JSON.
This happens because of the simplicity of the CSV files. It is just asigning a value to a name. That is why {"name1":"value1", "name2":"value2"...} can be represented as a CSV with this structure:
name1,name2, ...
value1,value2,...
In your case, you are converting a JSON with several levels, so Spark exception is saying that it cannot figure out how to convert such a complex structure into a CSV.
If you try to add only a second level to your JSON, it will work, but be careful. It will remove the names of the second level to include only the values in an array.
You can have a look at this link to see the example for json datasets. It includes an example.
As I have no information about the nature of the data, I can't say much more about it. But if you need to write the information as a CSV you will need to simplify the structure of your data.
Read json file in spark and create dataframe.
val path = "examples/src/main/resources/people.json"
val people = sqlContext.read.json(path)
Save the dataframe using spark-csv
people.write
.format("com.databricks.spark.csv")
.option("header", "true")
.save("newcars.csv")
Source :
read json
save to csv