I am wondering how to transfer the dataframe to json format - json

I am wondering how to transfer the dataframe to json format.
name ㅣ type ㅣ count
'james'ㅣ 'message'ㅣ 4
'kane' ㅣ 'text' ㅣ 3
'james'ㅣ 'text' ㅣ 2
'kane' ㅣ 'message'ㅣ 3
----------------------------result--------------------------------
dataframe to json fomat
data = [
{name : 'james', 'message' : 4, 'text; : 2}, {'name' : 'kane', 'message' :3, 'text' : 3}
]
How to change dataframe to json data?

You can use to_json and collect_list functions.
import pyspark.sql.functions as f
df1 = df.withColumn('json', f.struct('name', 'type', 'count')) \
.groupBy().agg(f.collect_list('json').alias('data')) \
.withColumn('data', f.to_json(f.struct(f.col('data')))) \
.show(10, False)
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|data |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"data":[{"name":"james","type":"message","count":4.0},{"name":"kane","type":"text","count":3.0},{"name":"james","type":"text","count":2.0},{"name":"kane","type":"message","count":3.0}]}|
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Related

Pandas converts string-typed JSON value to INT

I have list of objects as JSON. Each object has two properties: id(string) and arg(number).
When I use pandas.read_json(...), the resulting DataFrame has the id interpreted as number as well, which causes problems, since information is lost.
import pandas as pd
json = '[{ "id" : "1", "arg": 1 },{ "id" : "1_1", "arg": 2}, { "id" : "11", "arg": 2}]'
df = pd.read_json(json)
I'd expect to have a DataFrame like this:
id arg
0 "1" 1
1 "1_1" 2
2 "11" 2
I get
id arg
0 1 1
1 11 2
2 11 2
and suddenly, the once unique id is not so unique anymore.
How can I tell pandas to stop doing that?
My search so far only yielded results, where people where trying to achive the opposite - having columns of string beeing interpreted as numbers. I totally don't want to achive that in this case!
If you set the dtype parameter to False, read_json will not infer the types automatically:
df = pd.read_json(json, dtype=False)
Use dtype parameter for preventing cast id to numbers:
df = pd.read_json(json, dtype={'id':str})
print (df)
id arg
0 1 1
1 1_1 2
2 11 2
print (df.dtypes)
id object
arg int64
dtype: object

how to convert a list of dataframe to json in python

I want to convert below dataframes to json.
Salary :
Balance before Salary Salary
Date
Jun-18 27.20 15300.0
Jul-18 88.20 15300.0
Aug-18 176.48 14783.0
Sep-18 48.48 16249.0
Oct-18 241.48 14448.0
Nov-18 49.48 15663.0
Balance :
Balance
Date
Jun-18 3580.661538
Jul-18 6817.675556
Aug-18 7753.483077
Sep-18 5413.868421
Oct-18 5996.120000
Nov-18 8276.805000
Dec-18 9269.000000
I tried:
dfs = [Salary, Balance]
dfs.to_json("path/test.json")
but it gives me an error:
AttributeError: 'list' object has no attribute 'to_json'
but when I tried for single dataframe, I got the following result:
{"Balance before Salary":{"Jun-18":27.2,"Jul-18":88.2,"Aug-18":176.48,"Sep-18":48.48,"Oct-18":241.48,"Nov-18":49.48},"Salary":{"Jun-18":15300.0,"Jul-18":15300.0,"Aug-18":14783.0,"Sep-18":16249.0,"Oct-18":14448.0,"Nov-18":15663.0}}
You can use to_json method.
From the docs:
>>> df = pd.DataFrame([['a', 'b'], ['c', 'd']],
... index=['row 1', 'row 2'],
... columns=['col 1', 'col 2'])
>>> df.to_json(orient='records')
'[{"col 1":"a","col 2":"b"},{"col 1":"c","col 2":"d"}]'
Use concat for one DataFrame (necessary same index values in each DataFrame for alignment) and then convert to json:
dfs = [check_Salary_date, sum_Salary]
df = pd.concat(dfs, axis=1, keys=np.arange(len(dfs)))
df.columns = ['{}{}'.format(b, a) for a, b in df.columns]
df.to_json("path/test.json")

Can I save a Dataframe as pretty format JSON in Spark Scala?

While reading a CSV file as Dataframe in Spark Scala, can we save the file as pretty format JSON with the root tags?
I have following df:
+------+-----+
|number| word|
+------+-----+
| 8| bat|
| 64|mouse|
| -27|horse|
+------+-----+
If you want to create a root element you can use the following approach:
1. Create a function which will convert you DF to DF with JSON column:
def convertDFToJSON(df: DataFrame): DataFrame = {
val columns = df.columns
val outDF = df.map(row =>
"myroot : " +
JSONObject(row.getValuesMap(columns)).toString()
)
outDF.toDF("bla")
}
2. Apply a function on your DF:
val test1 = convertDFToJSON(someDF)
+--------------------+
| bla|
+--------------------+
|myroot : {"number...|
|myroot : {"number...|
|myroot : {"number...|
+--------------------+
3. Write a DF as a text:
test1.write.text("/tmp/some")
output:
myroot : {"number" : 8, "word" : "bat"}
myroot : {"number" : 64, "word" : "mouse"}
myroot : {"number" : -27, "word" : "horse"}

How to export pandas dataframe to json in specific format

My dataframe is
'col1' , 'col2'
A , 89
A , 232
C , 545
D , 998
and would like to export as follow :
{
'A' : [ 89, 232 ],
'C' : [545],
'D' : [998]
}
However, all the to_json does not fit this format (orient='records', ...).
Is there a way to ouput like this ?
Use groupby for convert to list and then to_json:
json = df.groupby('col1')['col2'].apply(list).to_json()
print (json)
{"A":[89,232],"C":[545],"D":[998]}
Detail:
print (df.groupby('col1')['col2'].apply(list))
col1
A [89, 232]
C [545]
D [998]
Name: col2, dtype: object

fromJSON encoding issue

I try to convert a json object to R dataframe, here is the json object:
json <-
'[
{"Name" : "a", "Age" : 32, "Occupation" : "凡达"},
{"Name" : "b", "Age" : 21, "Occupation" : "打蜡设计费"},
{"Name" : "c", "Age" : 20, "Occupation" : "的拉斯克奖飞"}
]'
then I use fromJSON, mydf <- jsonlite::fromJSON(json), the result is
Name Age Occupation
1 a 32 <U+51E1><U+8FBE>
2 b 21 <U+6253><U+8721><U+8BBE><U+8BA1><U+8D39>
3 c 20 <U+7684><U+62C9><U+65AF><U+514B><U+5956><U+98DE>
I was wondering how this happens, and is there any solution?
Using the package rjson can solve the problem, but the output is a list, but I want a dataframe output.
Thank you.
I've tried Sys.setlocale(locale = "Chinese"), well the characters are indeed Chinese,but the results are still weird like below:
Name Age Occupation
1 a 32 ·²´ï
2 b 21 ´òÀ¯Éè¼Æ·Ñ
3 c 20 µÄÀ­Ë¹¿Ë½±·É