jsonValue="{'Employee': ['{"userId":"rirani","jobTitleName":"Developer","firstName":"Romin","lastName":"Irani","preferredFullName":"Romin Irani","employeeCode":"E1","region":"CA","phoneNumber":"408-1234567","emailAddress":"romin.k.irani#gmail.com"}', '{"userId":"nirani","jobTitleName":"Developer","firstName":"Neil","lastName":"Irani","preferredFullName":"Neil Irani","employeeCode":"E2","region":"CA","phoneNumber":"408-1111111","emailAddress":"neilrirani#gmail.com"}', '{"userId":"thanks","jobTitleName":"Program Directory","firstName":"Tom","lastName":"Hanks","preferredFullName":"Tom Hanks","employeeCode":"E3","region":"CA","phoneNumber":"408-2222222","emailAddress":"tomhanks#gmail.com"}']}
"
with open("F://IDP Umesh//Data Transformation//test.json", 'w') as jsonFile:
jsonFile.write(json.dumps(jsonValue))
Out put from test.json
{"Employee": ["{\"userId\":\"rirani\",\"jobTitleName\":\"Developer\",\"firstName\":\"Romin\",\"lastName\":\"Irani\",\"preferredFullName\":\"Romin Irani\",\"employeeCode\":\"E1\",\"region\":\"CA\",\"phoneNumber\":\"408-1234567\",\"emailAddress\":\"romin.k.irani#gmail.com\"}", "{\"userId\":\"nirani\",\"jobTitleName\":\"Developer\",\"firstName\":\"Neil\",\"lastName\":\"Irani\",\"preferredFullName\":\"Neil Irani\",\"employeeCode\":\"E2\",\"region\":\"CA\",\"phoneNumber\":\"408-1111111\",\"emailAddress\":\"neilrirani#gmail.com\"}", "{\"userId\":\"thanks\",\"jobTitleName\":\"Program Directory\",\"firstName\":\"Tom\",\"lastName\":\"Hanks\",\"preferredFullName\":\"Tom Hanks\",\"employeeCode\":\"E3\",\"region\":\"CA\",\"phoneNumber\":\"408-2222222\",\"emailAddress\":\"tomhanks#gmail.com\"}"]}
How to remove '\' from the json content and make the valid json ?
Appreciate if anyone can help on this?
Thanks
Try this.
import json
jsonValue={'Employee': ['{"userId":"rirani","jobTitleName":"Developer","firstName":"Romin","lastName":"Irani","preferredFullName":"Romin Irani","employeeCode":"E1","region":"CA","phoneNumber":"408-1234567","emailAddress":"romin.k.irani#gmail.com"}', '{"userId":"nirani","jobTitleName":"Developer","firstName":"Neil","lastName":"Irani","preferredFullName":"Neil Irani","employeeCode":"E2","region":"CA","phoneNumber":"408-1111111","emailAddress":"neilrirani#gmail.com"}', '{"userId":"thanks","jobTitleName":"Program Directory","firstName":"Tom","lastName":"Hanks","preferredFullName":"Tom Hanks","employeeCode":"E3","region":"CA","phoneNumber":"408-2222222","emailAddress":"tomhanks#gmail.com"}']}
jsonValue['Employee'] = [json.loads(i ) for i in jsonValue['Employee']]
print(jsonValue)
with open("test.json", 'w') as jsonFile:
jsonFile.write(json.dumps(jsonValue))
The problem with your code is that you're dumping a string formatted as a json, dumps works when you need to convert a dict to a json formatted string.
Related
I have a CSV file like this CSV read by pandas like this
But when I read it with PySpark, it turned out like this
CSV read by PySpark
What's wrong with the delimiter in Spark and how can I fix it?
From the posted images, %2C, which is URL encode equivalent of ,, seems to be your delimiter.
Set delimiter to %2C and also use header option:
df = spark.read.option("header",True).option("delimiter", "%2C").csv(path)
Input CSV File:
date%2Copening%2Chigh%2Clow%2Cclose%2Cadjclose%2Cvolume
2022-12-09%2C100%2C101%2C99%2C99.5%2C99.5%2C10000000
2022-12-09%2C200%2C202%2C199%2C199%2C199.1%2C20000000
2022-12-09%2C300%2C303%2C299%2C299%2C299.2%2C30000000
Output dataframe:
+----------+-------+----+---+-----+--------+--------+
|date |opening|high|low|close|adjclose|volume |
+----------+-------+----+---+-----+--------+--------+
|2022-12-09|100 |101 |99 |99.5 |99.5 |10000000|
|2022-12-09|200 |202 |199|199 |199.1 |20000000|
|2022-12-09|300 |303 |299|299 |299.2 |30000000|
+----------+-------+----+---+-----+--------+--------+
I have several ugly json strings like the following:
test_string = '{\\"test_key\\": \\"Testing tilde \\u00E1\\u00F3\\u00ED\\"}'
that I need to transform it in a more visually friendly dictionary and then save it to a file:
{'test_key': 'Testing tilde áóí'}
So for that I am doing:
test_string = test_string.replace("\\\"", "\"") # I suposse there is a safer way to do this
print(test_string)
#{"test_key": "Testing tilde \u00E1\u00F3\u00ED"}
test_dict = json.loads(test_string, strict=False)
print(test_dict)
#{'test_key': 'Testing tilde áóí'}
At this point test_dict seems correct. Then I save it to a file:
with open('test.json', "w") as json_w_file:
json.dump(test_dict, json_w_file)
At this point the content of test.json is the ugly version of the json:
{"test_key": "Testing tilde \u00E1\u00F3\u00ED"}
Is there a safer way to transform my ugly json to a dictionary?
Then how could I save the visually friendly version of my dictionary to a file?
Python 3
The string looks like double-encoded json to me. This decodes it an writes a utf-8 json file.
test_string = '{\\"test_key\\": \\"Testing tilde \\u00E1\\u00F3\\u00ED\\"}'
test_dict = json.loads(json.loads(f'"{test_string}"'))
with open('test.json', "w") as json_w_file:
json.dump(test_dict, json_w_file, ensure_ascii=False)
I'd like to add one new line in front of each of my json document before Spark writes it into my s3 bucket:
df.createOrReplaceTempView("ParquetTable")
val parkSQL = spark.sql("select LAST_MODIFIED_BY, LAST_MODIFIED_DATE, NVL(CLASS_NAME, className) as CLASS_NAME, DECISION, TASK_TYPE_ID from ParquetTable")
parkSQL.show(false)
parkSQL.count()
parkSQL.write.json("s3://test-bucket/json-output-7/")
with only this command, it'll produce files with contents below:
{"LAST_MODIFIED_BY":"david","LAST_MODIFIED_DATE":"2018-06-26 12:02:03.0","CLASS_NAME":"/SC/Trade/HTS_CA/1234abcd","DECISION":"AGREE","TASK_TYPE_ID":"abcd1234-832b-43b6-afa6-361253ffe1d5"}
{"LAST_MODIFIED_BY":"sarah","LAST_MODIFIED_DATE":"2018-08-26 12:02:03.0","CLASS_NAME":"/SC/Import/HTS_US/9876abcd","DECISION":"DISAGREE","TASK_TYPE_ID":"abcd1234-832b-43b6-afa6-361253ffe1d5"}
but, what I'd like to achieve is something like below:
{"index":{}}
{"LAST_MODIFIED_BY":"david","LAST_MODIFIED_DATE":"2018-06-26 12:02:03.0","CLASS_NAME":"/SC/Trade/HTS_CA/1234abcd","DECISION":"AGREE","TASK_TYPE_ID":"abcd1234-832b-43b6-afa6-361253ffe1d5"}
{"index":{}}
{"LAST_MODIFIED_BY":"sarah","LAST_MODIFIED_DATE":"2018-08-26 12:02:03.0","CLASS_NAME":"/SC/Import/HTS_US/9876abcd","DECISION":"DISAGREE","TASK_TYPE_ID":"abcd1234-832b-43b6-afa6-361253ffe1d5"}
Any insight on how to achieve this result would be greatly appreciated!
Below code will concat {"index":{}} with existing row data in DataFrame & It will convert data into json then save json data using text format.
df
.select(
lit("""{"index":{}}""").as("index"),
to_json(struct($"*")).as("json_data")
)
.select(
concat_ws(
"\n", // This will split index column & other column data into two lines.
$"index",
$"json_data"
).as("data")
)
.write
.format("text") // This is required.
.save("s3://test-bucket/json-output-7/")
Final Output
cat part-00000-24619b28-6501-4763-b3de-1a2f72a5a4ec-c000.txt
{"index":{}}
{"CLASS_NAME":"/SC/Trade/HTS_CA/1234abcd","DECISION":"AGREE","LAST_MODIFIED_BY":"david","LAST_MODIFIED_DATE":"2018-06-26 12:02:03.0","TASK_TYPE_ID":"abcd1234-832b-43b6-afa6-361253ffe1d5"}
{"index":{}}
{"CLASS_NAME":"/SC/Import/HTS_US/9876abcd","DECISION":"DISAGREE","LAST_MODIFIED_BY":"sarah","LAST_MODIFIED_DATE":"2018-08-26 12:02:03.0","TASK_TYPE_ID":"abcd1234-832b-43b6-afa6-361253ffe1d5"}
counter={"a":1,"b":2}
With open('egg.json' , 'w') as json_file:
json.dump(counter, json_file)
So when I review my json file, it shows this:
{a:1 , b:2}
But I need it to be something like this:
[ [a:1], [b:2] ]
I've already tried adding
json.dump(counter, json_file, separator (' [ ', ' ] ')
But nothing will do the trick...
Is there a way to format the json file like the way you can format a CSV file?
I'd really like to know..... Thanks.
[a:1], [b:2] isn't valid json, so using the json module won't help you here.
If for some reason you want a formatted string output, you could instead do the following (don't call the file egg.json since it won't be valid json!):
counter = {'a':1, 'b':2}
output = []
for k, v in sorted(counter.items()):
output.append('[{}:{}]'.format(k, v))
with open('egg.txt', 'w') as txt_f:
txt_f.write(', '.join(output))
I have this JSON text:
data = {"one":"number","two":"string","three":"number","four":[{"five":"number","six","string"},{"five":"number","six":"string"}]}
How I can get "five"'s number and "six"'s string using Python 3.3 and using json module ?
P.S.: If I do print data['five'] it doesn't works with this error:
print(data['five'])
KeyError: 'five'
Thanks,
Marco
Try this:
data = {"one":"number","two":"string","three":"number","four":[{"five":"number","six":"string"},{"five":"number","six":"string"}]}
print(data['four'][0]['five']) # number
print(data['four'][0]['six']) # string