how to parse json object from json in postgres - json

I have one big json object
how can i achieve below sql in PostgreSQL without using table
SELECT value->'col1' AS mycolumn
FROM json_object_keys('{"jcol1": "A", "jcol2": "B"}') as value
Json is {"activities-heart":[{"dateTime":"2016-10-17","restingHeartRate":65}}]}
expected output heartrate :65

Related

convert nested json string column into map type column in spark

overall aim
I have data landing into blob storage from an azure service in form of json files where each line in a file is a nested json object. I want to process this with spark and finally store as a delta table with nested struct/map type columns which can later be queried downstream using the dot notation columnName.key
data nesting visualized
{
key1: value1
nestedType1: {
key1: value1
keyN: valueN
}
nestedType2: {
key1: value1
nestedKey: {
key1: value1
keyN: valueN
}
}
keyN: valueN
}
current approach and problem
I am not using the default spark json reader as it is resulting in some incorrect parsing of the files instead I am loading the files as text files and then parsing using udfs by using python's json module ( eg below ) post which I use explode and pivot to get the first level of keys into columns
#udf('MAP<STRING,STRING>' )
def get_key_val(x):
try:
return json.loads(x)
except:
return None
Post this initial transformation I now need to convert the nestedType columns to valid map types as well. Now since the initial function is returning map<string,string> the values in nestedType columns are not valid jsons so I cannot use json.loads, instead I have regex based string operations
#udf('MAP<STRING,STRING>' )
def convert_map(string):
try:
regex = re.compile(r"""\w+=.*?(?:(?=,(?!"))|(?=}))""")
obj = dict([(a.split('=')[0].strip(),(a.split('=')[1])) for a in regex.findall(s)])
return obj
except Exception as e:
return e
this is fine for second level of nesting but if I want to go further that would require another udf and subsequent complications.
question
How can I use a spark udf or native spark functions to parse the nested json data such that it is queryable in columnName.key format.
also there is no restriction of spark version, hopefully I was able to explain this properly. do let me know if you want me to put some sample data and the code for ease. Any help is appreciated.

Extract nested JSON content from JSON with use of JSON Extractor in jMeter

I have JSON content inside another JSON that I need to extract as it is, without parsing its contents:
{
"id": 555,
"name": "aaa",
"JSON": "{\r\n \"fake1\": {},\r\n \"fake2\": \"bbbb\",\r\n \"fake3\": \"eee\" \r\n}",
"after1": 1,
"after2": "test"
}
When I use JSON Extractor with JSON Path expression:
$.JSON
It returns:
"{
"fake1": {},
"fake2": "bbbb",
"fake3": "eee"
}"
when I need to get the raw string:
"{\r\n \"fake1\": {},\r\n \"fake2\": \"bbbb\",\r\n \"fake3\": \"eee\" \r\n}"
I think you need to switch to JSR223 PostProcessor instead of the JSON Extractor and use the following code:
def json = new groovy.json.JsonSlurper().parse(prev.getResponseData()).JSON
vars.put('rawString', org.apache.commons.text.StringEscapeUtils.escapeJson(json))
You will be able to refer the extracted value as ${rawString} where required.
More information:
Apache Groovy - Parsing and producing JSON
Apache Groovy: What Is Groovy Used For?
console.log(JSON.stringify(data.JSON))
Here data is your given JSON data.
At first, you have to extract your JSON/data. Then you have to stringify the JSON data using JSON.stringify().
The confusing fact you have done here is that you named your key in the JSON object as "JSON".
In js when you extract a JSON object if there is another nested JSON object you will always get JSON data by just data.key_name
where data is JSON data
key is for Nested JSON key

json conversion within pyspark: Cannot parse the schema in JSON format: Failed to convert the JSON string (big JSON string) to a data type

I'm having trouble with json conversion within pyspark working with complex nested-struct columns. The schema for the from_json doesn't seem to behave. Example:
import pyspark.sql.functions as f
df = spark.createDataFrame([[1,'a'],[2,'b'],[3,'c']], ['rownum','rowchar'])\
.withColumn('struct', f.expr("transform(array(1,2,3), i -> named_struct('a1',rownum*i,'a2',rownum*i*2))"))
df.display()
df.withColumn('struct',f.to_json('struct')).withColumn('struct',f.from_json('struct',df.schema['struct'])).display()
df.withColumn('struct',f.to_json('struct')).withColumn('struct',f.from_json('struct',df.select('struct').schema)).display()
fails with
Cannot parse the schema in JSON format: Failed to convert the JSON string (big JSON string) to a data type
Not sure if this is a syntax error on my end, an edge case that's failing, the wrong way to do things, or something else.
You're not passing the correct schema to from_json. Try with this instead:
df.withColumn('struct', f.to_json('struct')) \
.withColumn('struct', f.from_json('struct', df.schema["struct"].dataType)) \
.display()

Siddhi - How to generate a JSON string

I'm trying to generate a JSON string by combining various columns and save the JSON into Postgres table having JSON datatype. From the documentation, it is clear about reading feom JSON string.
define stream InputStream(json string);
from InputStream
select json:getString(json,"$.name") as name
insert into OutputStream;
But can we build the JSON in-flight and insert into table? Something like...
select '{"myname":json:getString(json,"$.name")}' as nameJSON
insert into postgresDB
Where nameJSON will be a JSON datatype in Postgres.
Any help would be greatly appreciated.
You can use, JSON:setElement to create a JSON from the attributes
from OutputStream
select json:setElement("{}", "$", json:getString(json,"$.name"), "myname") as value
insert into TempStream;

sparksql Convert dataframe to json

My requirement is to pass dataframe as input parameter to a scala class which saves the data in json format to hdfs.
The input parameter looks like this:
case class ReportA(
parm1: String,
parm2: String,
parm3: Double,
parm4: Double,
parm5: DataFrame
)
I have created a JSON object for this parameter like:
def write(xx: ReportA) = JsObject(
"field1" -> JsString(xx.parm1),
"field2" -> JsString(xx.parm2),
"field3" -> JsNumber(xx.parm3),
"field4" -> JsNumber(xx.parm4),
"field5" -> JsArray(xx.parm5)
)
parm5 is a dataframe and wanted to convert as Json array.
How can I convert the dataframe to Json array?
Thank you for your help!!!
A DataFrame can be seen to be the equivalent of a plain-old table in a database, with rows and columns. You can't just get a simple array from it, the closest you woud come to an array would be with the following structure :
[
"col1": [val1, val2, ..],
"col2": [val3, val4, ..],
"col3": [val5, val6, ..]
]
To achieve a similar structure, you could use the toJSON method of the DataFrame API to get an RDD<String> and then do collect on it (be careful of any OutOfMemory exceptions).
You now have an Array[String], which you can simply transform in a JsonArray depending on the JSON library you are using.
Beware though, this seems like a really bizarre way to use Spark, you generally don't output and transform an RDD or a DataFrame directly into one of your objects, you usually spill it out onto a storage solution.