How to parse rows of a small DataFrame as json strings?

How to parse rows of a small DataFrame as json strings? - json

I have a DataFrame df is the result of some pre-processing. The size of df is around 10,000 rows.
I save this DataFrame in CSV as follows:
df.coalesce(1).write.option("sep",";").option("header","true").csv("output/path")
Now I want to save this DataFrame as txt file in which is row is a JSON string. So, the column names should be passed to attributes in JSON strings.
For example:
df =
col1 col2 col3
aa 34 55
bb 13 77
json_txt =
{"col1": "aa", "col2": "34", "col3": "55"}
{"col1": "bb", "col2": "13", "col3": "77"}
Which is the best way to do it?

You can use write.json api to save a dataframe in json format as
df.coalesce(1).write.json("output path of json file")
Above code would create a json file. But if you want a text format (json text) then you can use toJSON api as
df.toJSON.rdd.coalesce(1).saveAsTextFile("output path to text file")
I hope the answer is helpful

Related

How to loop through json and create a dataframe

I have a JSON file like below, how can I make a dataframe out of this. I want to make the main key an index and subkey as a column.
{
"PACK": {
"labor": "Recycle",
"actual": 0,
"Planned": 2,
"max": 6
},
"SORT": {
"labor": "Mix",
"actual": 10,
"Planned": 4,
"max": 3
}
}
The expected output is something like, I tried to use df.T but does not work. Any help on this is appreciated.
actual planned
PACK 0 2
SORT 10 4

You can read your json file to dict. Then create dataframe with dict values as data and dict keys as index.
import json
import pandas as pd
with open('test.json') as f:
data = json.load(f)
df = pd.DataFrame(data.values(), index=data.keys())
print(df)
labor actual Planned max
PACK Recycle 0 2 6
SORT Mix 10 4 3
The select columns with
df = df[['actual', 'planned']]

Pandas can read JSON files in many formats. For your use case, the following option should read your data the way you want:
pd.read_json(json_file, orient="index")
More information about the orient option can be found at the official documentation.

How to read with pandas a set of arrays containing one JSON object each?

I am trying to read from a text file into a pandas dataframe. The text file seems to be a 2D array of JSON, how could I read it?
[[{'metric_name':'CPU','category':'A','data':'9','time_stamp':'2019-03-28 13:15:31'}],[{'metric_name':'Disk','category':'B','data':'56','time_stamp':'2019-03-28 13:15:31'}]]
I expect to have the parameters "metric_name", "category", "data", "time_stamp" as headers

Here is a solution :
import json
import pandas as pd
# load the file
raw_data = json.load(open('myfile.json'))
# raw_data contains a nested list, so convert it to a simple list :
data = [x[0] for x in raw_data]
# then create the dataframe
df = pd.DataFrame.from_records(data)
Here is the content of data. The nested list has been converted to a simple list (assuming that we have one record per array) :
[{"category": "VM1",
"data": "9",
"metric_name": "CPU",
"time_stamp": "2019-03-28 13:15:31"},
{"category": "VM1",
"data": "9",
"metric_name": "CPU",
"time_stamp": "2019-03-28 13:15:31"}]

how to convert JSON data to .tsv file using python.

My json data looks like this :
data ={
"time": "2018-10-02T10:19:48+00:00",
"class": "NOTIFICATION",
"type": "Access Control",
"event": "Window/Door",
"number": -61
}
Desired output have to be like this:
time class type event number
2018-10-02T10:19:48+00:00 NOTIFICATION Access Control Window/Door -61
could anyone help me out, Thanks in advance

I think it's the same as converting JSON to csv, but instead of using the comma you can use tab as a delimeter, as follows:
import json
import csv
# input data
json_file = open("data.json", "r")
json_data = json.load(json_file)
json_file.close()
data = json.loads(json_data)
tsv_file = open("data.tsv", "w")
tsv_writer = csv.writer(tsv_file, delimiter='\t')
tsv_writer.writerow(data[0].keys()) # write the header
for row in data: # write data rows
tsv_writer.writerow(row.values())
tsv_file.close()
The above code will work if you json file has multiple data rows. If you have only one data row, the below code should work for you:
tsv_writer.writerow(data.keys()) # write the header
tsv_writer.writerow(data.values()) # write the values
Hope this helps.

How to create DataFrame based on multiple JSON files

I have many JSON files inside a folder. All of them have the same structure. Now I want to create the DataFrame, and each JSON file should be the row of this DataFrame.
I know how to create DataFrame based on a single JSON string, but I don't know how to deal with multiple ones:
import spark.implicits._
val jsonStr = """{ "key": 111, "value": 54, stamp: "aaa"}"""
val df = spark.read.json(Seq(jsonStr).toDS)

Assuming you have your JSONs in folder src/main/resources
Following code will produce desired result:
private val df: DataFrame = spark.read.json("src/main/resources")
df.show()
+---+-----+-----+
|key|stamp|value|
+---+-----+-----+
|111| aaa| 54|
|111| aaa| 54|
+---+-----+-----+
Note that JSON should be machine-readable, not human readable (that means that JSONs shouldn't have new line characters.

How to clean up my JSON response before writing to file in python/django?

This where the fucntion converts dictionary to json for a HttpResponse -
HttpResponse(
json.dumps(cm_dict),
content_type='application/javascript; charset=utf8'
)
I understand that since content type is set, the json response is displayed pretty. But I what want to do is write the the json to a file and come in a somewhat similar format.
What i get is this -
"{\"a\" : \"b\", \"c\" : \"d\"}"
that is written into file using the below -
with open('data.json', 'w') as outfile:
json.dump(json_data, outfile, sort_keys=True, indent=4,ensure_ascii=False)
what i want is -
{
"a": "b",
"c": "d"
}

Looks like the contents of json_data is already JSON. There is no need to dump it to JSON again; just write the string.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to parse rows of a small DataFrame as json strings? - json

Related

How to loop through json and create a dataframe

How to read with pandas a set of arrays containing one JSON object each?

how to convert JSON data to .tsv file using python.

How to create DataFrame based on multiple JSON files

How to clean up my JSON response before writing to file in python/django?

Categories

Resources