How to format json output in pyspark? - json

I am having a trouble to preserve the order of my json and pretty printing it in pyspark.
Below is sample code:
json_out = sqlContext.jsonRDD(sc.parallelize([json.dumps(info)]))
# here info is my ordered dictionary
json_out.toJSON().saveAsTextFile("file:///home//XXX//samplejson")
One more thing is that I want my output as single file and not as partitioned datasets.
Could anyone help in pretty printing and preserving the order of output json in my case?
info sample:
Note:TypeA,TypeB etc is a list meaning there can be more than one product in TypeA or TypeB.
{
"score": {
"right": ,
"wrong":
},
"articles": {
"TypeA": [{
"ID": 333,
"Name": "",
"S1": "",
"S2": "",
"S3": "",
"S4": ""
}],
"TypeB": [{
"ID": 123,
"Name": "",
"T1": "",
"T2": "",
"T3": "",
"T4": "",
"T5": "",
"T6": ""
}]
}
}
( I have tried using json.dumps(info,indent=2),but of no use.

Related

Transform complex json files using ADF

I want to transform multiple complex JSON files into one complex JSON file using Azure Data Factory dataflow.
The multiple complex input JSON files are in the following format:
{
"creationDate": "2022-01-19T17:00:17Z",
"count": 2,
"data": [
{
"id": "",
"name": "Project A",
"url": "",
"state": "Open",
"revision": 1,
"visibility": "private",
"lastUpdateTime": "2019-09-23T08:44:45.103Z"
},
{
"id": "",
"name": "Project B",
"url": "",
"state": "Done",
"revision": 1,
"visibility": "private",
"lastUpdateTime": "2019-12-31T09:38:49.16Z"
}
]
}
We want to transform those files to one single json file in the format:
[
{
"date": "2022-01-14",
"count": 2,
"projects": [
{
"name": "Project A",
"state": "",
"lastUpdateTime": ""
},
{
"name": "Project B",
"state": "",
"lastUpdateTime": ""
}
]
},
{
"date": "2022-01-17",
"count": 3,
"projects": [
{
"name": "Project A",
"state": "",
"lastUpdateTime": ""
},
{
"name": "Project B",
"state": "",
"lastUpdateTime": ""
},
{
"name": "Project C",
"state": "",
"lastUpdateTime": ""
}
]
}
]
We were using the derived column with the expression #(name=data.name, state=data.state).
Can someone help us how to do this? We tried a lot of things like derived column, first flattening but we can't get it as we like...
Thanks!
The solution on the end was pretty close on what we had.
So our final solution is as follow:
First flatten with Unroll by set to data. We also mapped the creationDate to date.
Create a derived column called projects with as expression #(name=name,state=state,lastUpdateTime=lastUpdateTime,url=url)
Group activity with a Group by on date. Set Aggregates for count on first(count) and (this was the solution) set projects to collect(projects).
Select activity which will select the columns date, count and projects.
Sort activity with a sort on date Ascending.
Sink with file name option to output to single file and partion option set to single partion
Note:
Because we have a sink to one big json file (output to single file). The sorting wasn't correct written to json. If we debugged (data preview) the dataflow everything was correct. Strange behavior. When we changed the Sort activity the option Partion option to Single partion the json file had the right sort order.

How to get JSON data from attributes that start with # or # in Typescript

I am working with a specific API that returns a JSON that looks like the below sample.
I want to get both values that contain the #text and #attr but I get error messages in typescript when I try to get the values.
try using,
album[0]["#attr"]
album[0]["artist"]["#text"]
Hey for JSON you can use get details by its attribute name in it and it's the same for all-weather it starts with # or # it will be the same.
See below code to get the value of your specified key:
Sample JSON:
{
"weeklyalbumchart": {
"album": [
{
"artist": {
"mbid": "data",
"#text": "Flying Lotus"
},
"mbid": "data",
"url": "",
"name": "",
"#attr": {
"rank": "1"
},
"playcount": "21"
},
{
"artist": {
"mbid": "data",
"#text": "Flying Lotus"
},
"mbid": "data",
"url": "",
"name": "",
"#attr": {
"rank": "1"
},
"playcount": "21"
}
]
}
}
Read JSON:
#attr ===> json["weeklyalbumchart"]["album"][0]["#attr"]
#text ===> json["weeklyalbumchart"]["album"][0]["artist"]["#text"]
Hope this will help you to understand it.

Deep_search in big json object by method ruby

I have a problem with filtering JSON object in ruby!
1. My JSON object is a big array of two hashes.
2. That hashes includes another hashes that include another arrays and hashes (oh god! :c).
My goal is to output Big hash that contains concrete value!
Examples down below:
JSON file just like in
[#That's hash 0{
"id": 0,
"firstName": "",
"lastName": "",
"middleName": null,
"email": "",
"phones": [
null,
null
],
"groups": [{
"id": 0,
"name": ""
}],
"disabled": "",
"technologies": [{
"id": 0,
"name": "",
"children": [{
"id": 1,
"name": "",
"children": [{
"id": 2,
"name": "Farmer",
"children": []
}]
}]
}],
"fullName": ""
},
#That's hash1{
"id": 0,
"firstName": "",
"lastName": "",
"middleName": null,
"email": "",
"phones": [
null,
null
],
"groups": [{
"id": 0,
"name": ""
}],
"disabled": "",
"technologies": [{
"id": 0,
"name": "",
"children": [{
"id": 1,
"name": "",
"children": [{
"id": 2,
"name": "Not Farmer",
"children": []
}]
}]
}],
"fullName": ""
}
]
Pseudocode on ruby (what I want to):
file = File.read("example.json") #=> Reading JSON file
data_hash = JSON.parse(file, object_class: Hash) #=> Parsing JSON file
data = data_hash.filter #=> filter that hash if "technologies" is not empty!
data.get_hash_by_value(value) #=> For example i put "Not Farmer" in value, and that method must search in all data that (value) and output hash1 for me (because hash0 not include "Not Farmer")
That's big problem, i don't know what to do!!!
My thoughts is a recursive finding method..
I wrote my own functions. Maybe it can help someone.
def check_children(item)
return true if item["name"] == "Farmer"
item["children"].each do |child_item|
break if check_children(child_item)
end
return false
end
data_hash.each do |item|
next if item["technologies"].empty?
item["technologies"].each do |technologies_item|
next if technologies_item["children"].empty?
technologies_item["children"].each do |children_item|
data << item if check_children(children_item)
end
end
end

R jsonlite: create JSON data in a specific format

I want to create JSON data using R's jsonlite package to load to DynamoDB using Python. I want the data to be in the structure shown below. How can I create this in R? I tried creating a data frame where one of the columns is a list and change the data frame to json but the result is not in the required format. I also tried a converting a list which contains a list, but the structure of the output json is not what I want.
[
{
"ID": 100,
"title": "aa",
"more": {
"interesting":"yes",
"new":"no",
"original":"yes"
}
},
{
"ID": 110,
"title": "bb",
"more": {
"interesting":"no",
"new":"yes",
"original":"yes"
}
},
{
"ID": 200,
"title": "cc",
"more": {
"interesting":"yes",
"new":"yes",
"original":"no"
}
}
]
Here is my sample data and what I tried:
library(jsonlite)
ID=c(100,110,200)
Title=c("aa","bb","cc")
more=I(list(Interesting=c("yes","no","yes"),new=c("no","yes","yes"),original=c("yes","yes","no")))
a=list(ID=ID,Title=Title,more=more)
a=toJSON(a)
write(a,"temp.json") # this does not give the structure I want
this will produce what you need:
library(jsonlite)
ID=c(100,110,200)
Title=c("aa","bb","cc")
df <- data.frame(ID, Title)
more=data.frame(Interesting=c("yes","no","yes"),new=c("no","yes","yes"),original=c("yes","yes","no"))
df$more <- more
toJSON(df)
output:
[{
"ID": 100,
"Title": "aa",
"more": {
"Interesting": "yes",
"new": "no",
"original": "yes"
}
}, {
"ID": 110,
"Title": "bb",
"more": {
"Interesting": "no",
"new": "yes",
"original": "yes"
}
}, {
"ID": 200,
"Title": "cc",
"more": {
"Interesting": "yes",
"new": "yes",
"original": "no"
}
}
]

Create JSON.NET structure with JTokenWriter

Hey all I have the following json output that I would like to create:
{
"scheduleName": "",
"firstName": "",
"lastName": "",
"theRole": "",
"linker": "",
"Schedule": {
"ID": "",
"totalHrs": "",
"Mon": "",
"Tue": "",
"Wed": "",
"Thu": "",
"Fri": "",
"Sat": ""
},
"empInfo": {
"ID": "",
"Email": "",
"Phone": "",
"Active": "",
"Img": "",
"Badge": ""
},
"availability": {
"ID": "",
"Mon": "",
"Tue": "",
"Wed": "",
"Thu": "",
"Fri": "",
"Sat": ""
},
"training": {
"name": "",
"id": ""
}
}
Using the newtonsoft Create JSON with JTokenWriter I am wondering how to create the "Schedule", "empInfo", etc in my json output since there are no examples on the page of those types.
The only example it shows is structured like so:
{
"name1": "value1",
"name2": [
1,
2
]
}
The first few values are easy to create:
Dim jsonWriter As New JTokenWriter()
jsonWriter.WriteStartObject()
jsonWriter.WritePropertyName("scheduleName")
jsonWriter.WriteValue("value1")
jsonWriter.WritePropertyName("firstName")
jsonWriter.WriteValue("value2")
jsonWriter.WritePropertyName("lastName")
jsonWriter.WriteValue("value3")
jsonWriter.WritePropertyName("theRole")
jsonWriter.WriteValue("value4")
jsonWriter.WritePropertyName("linker")
jsonWriter.WriteValue("value5")
'"?": {
' "?": "?",
' "?": "?",
' etc....
'?
jsonWriter.WriteEndObject()
But that's where I have to stop since I do not know how to go about making the other structure.
To write a nested object as the value of a property, write the property name, then do a nested WriteStartObject(), followed by the properties to be written, and finally a nested WriteEndObject(). E.g.:
Dim jsonWriter As New JTokenWriter()
jsonWriter.WriteStartObject() 'Start the root object
jsonWriter.WritePropertyName("scheduleName")
jsonWriter.WriteValue("value1")
jsonWriter.WritePropertyName("Schedule") 'Write the "Schedule" property name
jsonWriter.WriteStartObject() 'Start the nested "Schedule" object
jsonWriter.WritePropertyName("ID")
jsonWriter.WriteValue("ID Value")
jsonWriter.WriteEndObject() 'End the Schedule object
jsonWriter.WriteEndObject() 'End the root object
Sample fiddle.