Convert array of JSON objects to string in pyspark - json

I have one requirement in which I need to create a custom JSON from the columns returned from one PySpark dataframe. So I wrote one UDF like the below which will return a JSON in String format from UDF for each row.
Parameter "entities" are in the array of JSON format.
def halResponse(entities, admantx, copilot_id):
json_resp = "{\"analyzedContent\": {"+json.dumps(entities)+"}}"
return json_resp
But in the response, I am not getting proper JSON i.e instead of proper key: value pair, I am just getting values(actual values replace with * for security purpose), not key and value.
Find the sample response:
"analyzedContents": [
{
"entities": [
[
"******",
*,
*********,
[
[
"***********",
"***********",
"***********",
[
"*****************"
],
**********
]
],
"**************"
]
]
}
]
}
Please help me to resolve this issue. After fixing, I should get the below sample response
"analyzedContents": [
{
"entities": [
[
"key":******",
"key":*,
"key":*********,
[
[
"key":"***********",
"key":"***********",
"key":"***********",
[
"key":"*****************"
],
"key":**********
]
],
"key":"**************"
]
]
}
]
}

Try this without using an UDF:
import pyspark.sql.functions as F
df2 = df.withColumn(
'response',
F.concat(
F.lit("{\"analyzedContent\": {"),
F.to_json(F.col("entities")),
F.lit("}}")
)
)

Related

API POST request in Julia

I am trying to convert some Python code to Julia. Here is the Python code:
url = "http://api.scb.se/OV0104/v1/doris/sv/ssd/START/BE/BE0101/BE0101G/BefUtvKon1749"
json = {
"query": [
{
"code": "Kon",
"selection": {
"filter": "item",
"values": [
"1",
"2"
]
}
},
{
"code": "ContentsCode",
"selection": {
"filter": "item",
"values": [
"000000LV"
]
}
}
],
"response": {
"format": "px"
}
}
r = requests.post(url=url, json=json)
Below is the Julia code, that is not working, with this error message:
syntax: { } vector syntax is discontinued around path:8
top-level scope at population_data.jl:8
using DataFrames, DataFramesMeta, HTTP, JSON3
url = "http://api.scb.se/OV0104/v1/doris/sv/ssd/START/BE/BE0101/BE0101G/BefUtvKon1749"
json = {
"query": [
{
"code": "Kon",
"selection": {
"filter": "item",
"values": [
"1",
"2",
"1+2"
]
}
},
{
"code": "ContentsCode",
"selection": {
"filter": "item",
"values": [
"000000LV"
]
}
}
],
"response": {
"format": "px"
}
}
r = HTTP.post(url, json)
My attempts to solve this are the following:
Convert the json variable to a string using """ around it.
Converting the JSON string to Julia data types, using JSON3.read()
Passing the converted JSON string to the POST request. This gives the following error:
IOError(Base.IOError("read: connection reset by peer (ECONNRESET)", -54) during request(http://api.scb.se/OV0104/v1/doris/sv/ssd/START/BE/BE0101/BE0101G/BefUtvKon1749)
None of it works, and I am not even sure that it is about the JSON format. It could be that I am passing the wrong parameters to the POST request. What should I do?
One way of solving this consists in building the parameters as native julia data structures, and use JSON to convert and use them as the body of your PUT request:
Dictionaries in julia are built using a syntax like Dict(key => value). Arrays are built using a standard syntax: [a, b, c]. The julia native data structure equivalent to your parameters would look like this:
params = Dict(
"query" => [
Dict("code" => "Kon",
"selection" => Dict(
"filter" => "item",
"values" => [
"1",
"2",
"1+2"
]),
),
Dict("code"=> "ContentsCode",
"selection" => Dict(
"filter" => "item",
"values" => [
"000000LV"
]),
),
],
"response" => Dict(
"format" => "px"
))
Then, you can use JSON.json() to build the JSON representation of it as a string and pass it to the HTTP request:
using HTTP
using JSON
url = "http://api.scb.se/OV0104/v1/doris/sv/ssd/START/BE/BE0101/BE0101G/BefUtvKon1749"
# send the request
r = HTTP.request("POST", url,
["Content-Type" => "application/json"],
JSON.json(params))
# retrieve the response body as a string
b = String(r.body)

How to retrieve list item at index based on name from Json object in Python

E.g. Given the following JSON:
{
"success": true,
"message": "''",
"result": [
{
"buy": [
{
"quantity": 12.37,
"rate": 32.55412402
}
],
"sell": [
{
"quantity": 12.37,
"rate": 32.55412402
}
]
}
]
}
How would I go about retrieving the 'Buy' and 'Sell' to store into variables?
I can get the result via:
d = json.loads(string)
print(d['result'])
However I fail to see how to retrieve the buy object now. E.g. I've tried:
#print(d['result']['buy'])
#print(d['result'].buy)
#print(d['result'].indexOf('buy'))
All to no avail.
Try:
print(d['result'][0]['buy'])
should give you the buy object:
[{u'rate': 32.55412402, u'quantity': 12.37}]
If you inspect the type of d['result']:
print(type(d['result'])) # <type 'list'>
It's a list of length 1, so the [0] at the end of d['result'][0] will return that first and only item in the list which is the dictionary you are expecting that you can get by key ['buy'] as you did in your first try:
#print(d['result']['buy'])
there just need to index the list by [0]
#print(d['result'][0]['buy'])

JSON to CSV conversion Linux terminal

I have the following example.json. How can I parse it to csv in order to get the mean value (between ** mean_value **).
I want something like in example.csv:
305152,277504,320512
[
{
"name": "stats",
"columns": [
"time",
"mean"
],
"points": [
[
1444038496000,
**305152**
],
[
1444038494000,
**277504**
],
[
1444038492000,
**320512**
]
]
}
]
In python it looks like this
import json
results = []
with open('example.json', 'r') as f:
content = json.loads(f.read())
for element in content:
results.append(','.join([str(y[1]) for y in element['points']]))
with open('example.csv', 'w') as f:
f.write('\n'.join(results))

How to create existing JSON in groovy, using JsonBuilder?

I'm looking for definition (structure) of object that can be converted to following JSON
{
"header":{
"callbackUrl":"",
"clientOrderId":"A565132",
"clientOriginationId":"2345FE",
"serviceProvider":"VERIZON",
"transactionId":"EEDT44567"
},
"customer": {
"nationalIdType":"",
"nationalId":"",
"addresses":[
{
"type":"WORK",
"postalCode":"330066"
}
],
"serviceProviderAuthentication":[
{
"passcode":"",
"securityQuestion":"",
"securityAnswer":""
}
]
},
"accountPhoneNumber":"",
"accountNumber":""
}
If you're looking for an example of how JsonBuilder would be used to create the JSON you've given, here it is
def json = new groovy.json.JsonBuilder()
json header: [
callbackUrl:"",
clientOrderId:"A565132",
clientOriginationId:"2345FE",
serviceProvider:"VERIZON",
transactionId:"EEDT44567"
],
customer:[
nationalIdType:"",
nationalId:"",
addresses: [
[
type:"WORK",
postalCode:"330066"
]
],
serviceProviderAuthentication:[
[
passcode:"",
securityQuestion:"",
securityAnswer:""
]
]
],
accountPhoneNumber:"",
accountNumber:""
json.toString()
You might have been confused about how to create a JSON that doesn't have a root. The answer is: by passing a map.

render as JSON clipping of the zeros of the floating point value

I have a map which is like this, which has to be rendered as JSON to the output.
def formatedResult = [
results:[
[ Name:foo, sex:m, salary:171.900 ],
[ Name:bar, sex:m, salary:171.900 ]
]
]
I am rendering this response as
withFormat {
json {
render formatedResult as JSON
}
}
which produces the following result.
{
results: [{
Name: "foo",
sex: "m",
salary: 171.9
}, {
Name: "bar",
sex: "m",
salary: 171.9
}]
}
But is clipping off the zeros from the salary. What should I do to get the JSON with out clipping off the zeros?
If you are hard coding your values as in your example put them like this:
def formatedResult = [
results:[
[ Name:foo, sex:m, salary:"171.900" ],
[ Name:bar, sex:m, salary:"171.900" ]
]
]
Or if you are getting them from variable use toString() method to convert them to string.
def formatedResult = [
results:[
[ Name:foo, sex:m, salary:salary.toString()],
[ Name:bar, sex:m, salary:salary.toString()]
]
]
Finally your render:
withFormat {
json {
render formatedResult as JSON
}
}