How to read field from nested json? - json

this is my test json file.
{
"item" : {
"fracData" : [ ],
"fractimeData" : [ {
"number" : "1232323232",
"timePeriods" : [ {
"validFrom" : "2021-08-03"
} ]
} ],
"Module" : [ ]
}
}
This is how I read the json file.
starhist_test_df = spark.read.json("/mapr/xxx/yyy/ttt/dev/rawdata/Test.json", multiLine=True)
starhist_test_df.createOrReplaceTempView("v_test_df")
This query works.
df_test_01 = spark.sql("""
select item.fractimeData.number from v_test_df""")
df_test_01.collect();
Result
[Row(number=['1232323232'])]
But this query doesn't work.
df_test_01 = spark.sql("""
select item.fractimeData.timePeriods.validFrom from v_test_df""")
df_test_01.collect();
Error
cannot resolve 'v_test_df.`item`.`fractimeData`.`timePeriods`['validFrom']' due to data type mismatch: argument 2 requires integral type, however, ''validFrom'' is of string type.; line 3 pos 0;
What do I have to change, to read the validFrom field?

dot notation to access values works with struct or array<struct> types.
The schema for field number in item.fractimeData is string and accessing it via dot notation returns an array<string> since fractimeData is an array.
Similarly, the schema for field timePeriods in item.fractimeData is <array<struct<validFrom>>, and accessing it via dot notation wraps it into another array, resulting in final schema of array<array<struct<validFrom>>>.
The error you get is because the dot notation can work on array<struct> but not on array<array>.
Hence, flatten the result from item.fractimeData.timePeriods to get back an array<struct<validFrom>> and then apply the dot notation.
df_test_01 = spark.sql("""
select flatten(item.fractimeData.timePeriods).validFrom as validFrom from v_test_df""")
df_test_01.collect()
"""
[Row(validFrom=['2021-08-03', '2021-08-03'])]
"""

Related

How to parse multidimensional json array

I'm trying to retreive some specific data from a json stored in my database.
Here is my fidle : https://www.db-fiddle.com/f/5qZhsyddqJNej2NGj1x1hi/1
An exemple of a json string :
{
"complexProperties":[
{
"properties":{
"key":"Registred",
"Value":"123456789"
}
},
{
"properties":{
"key":"Urgency",
"Value":"Total"
}
},
{
"properties":{
"key":"ImpactScope",
"Value":"All"
}
}
]
}
In this case I need to retreive the value of Registred which is 123456789
Here is the request I tried to retreive first all value:
SELECT CAST(data AS jsonb)::json->>'complexProperties'->'properties' AS Registred FROM jsontesting
Query Error: error: operator does not exist: text -> unknown
You can use a JSON Path expression:
select jsonb_path_query_first(data, '$.complexProperties[*].properties ? (#.key == "Registred").Value')
from jsontesting;
This returns a jsonb value. If you need to convert that to a text value, use jsonb_path_query_first(...) #>> '{}'
Online example
An alternative that first flattens the JSON field (the arrj subquery) and then performs an old-school select. Using your jsontesting table -
select (j -> 'properties' ->> 'Value')
from
(
select json_array_elements(data::json -> 'complexProperties') as j
from jsontesting
) as arrj
where j -> 'properties' ->> 'key' = 'Registred';
Online example

How to merge a dynamically named record with a static one in Dhall?

I'm creating an AWS Step Function definition in Dhall. However, I don't know how to create a common structure they use for Choice states such as the example below:
{
"Not": {
"Variable": "$.type",
"StringEquals": "Private"
},
"Next": "Public"
}
The Not is pretty straightforward using mapKey and mapValue. If I define a basic Comparison:
{ Type =
{ Variable : Text
, StringEquals : Optional Text
}
, default =
{ Variable = "foo"
, StringEquals = None Text
}
}
And the types:
let ComparisonType = < And | Or | Not >
And adding a helper function to render the type as Text for the mapKey:
let renderComparisonType = \(comparisonType : ComparisonType )
-> merge
{ And = "And"
, Or = "Or"
, Not = "Not"
}
comparisonType
Then I can use them in a function to generate the record halfway:
let renderRuleComparisons =
\( comparisonType : ComparisonType ) ->
\( comparisons : List ComparisonOperator.Type ) ->
let keyName = renderComparisonType comparisonType
let compare = [ { mapKey = keyName, mapValue = comparisons } ]
in compare
If I run that using:
let rando = ComparisonOperator::{ Variable = "$.name", StringEquals = Some "Cow" }
let comparisons = renderRuleComparisons ComparisonType.Not [ rando ]
in comparisons
Using dhall-to-json, she'll output the first part:
{
"Not": {
"Variable": "$.name",
"StringEquals": "Cow"
}
}
... but I've been struggling to merge that with "Next": "Sup". I've used all the record merges like /\, //, etc. and it keeps giving me various type errors I don't truly understand yet.
First, I'll include an approach that does not type-check as a starting point to motivate the solution:
let rando = ComparisonOperator::{ Variable = "$.name", StringEquals = Some "Cow" }
let comparisons = renderRuleComparisons ComparisonType.Not [ rando ]
in comparisons # toMap { Next = "Public" }
toMap is a keyword that converts records to key-value lists, and # is the list concatenation operator. The Dhall CheatSheet has a few examples of how to use both of them.
The above solution doesn't work because # cannot merge lists with different element types. The left-hand side of the # operator has this type:
comparisons : List { mapKey : Text, mapValue : Comparison.Type }
... whereas the right-hand side of the # operator has this type:
toMap { Next = "Public" } : List { mapKey : Text, mapValue : Text }
... so the two Lists cannot be merged as-is due to the different types for the mapValue field.
There are two ways to resolve this:
Approach 1: Use a union whenever there is a type conflict
Approach 2: Use a weakly-typed JSON representation that can hold arbitrary values
Approach 1 is the simpler solution for this particular example and Approach 2 is the more general solution that can handle really weird JSON schemas.
For Approach 1, dhall-to-json will automatically strip non-empty union constructors (leaving behind the value they were wrapping) when translating to JSON. This means that you can transform both arguments of the # operator to agree on this common type:
List { mapKey : Text, mapValue : < State : Text | Comparison : Comparison.Type > }
... and then you should be able to concatenate the two lists of key-value pairs and dhall-to-json will render them correctly.
There is a second solution for dealing with weakly-typed JSON schemas that you can learn more about here:
Dhall Manual - How to convert an existing YAML configuration file to Dhall
The basic idea is that all of the JSON/YAML integrations recognize and support a weakly-typed JSON representation that can hold arbitrary JSON data, including dictionaries with keys of different shapes (like in your example). You don't even need to convert the entire the expression to this weakly-typed representation; you only need to use this representation for the subset of your configuration where you run into schema issues.
What this means for your example, is that you would change both arguments to the # operator to have this type:
let Prelude = https://prelude.dhall-lang.org/v12.0.0/package.dhall
in List { mapKey : Text, mapValue : Prelude.JSON.Type }
The documentation for Prelude.JSON.Type also has more details on how to use this type.

How do I concatenate a dynamic string value using Groovy to JSON response after it's parsed to get a specific node value in JSON

slurperresponse = new JsonSlurper().parseText(responseContent)
log.info (slurperresponse.WorkItems[0].WorkItemExternalId)
The above code helps me get the node value "WorkItems[0].WorkItemExternalId" using Groovy. Below is the response.
{
"TotalRecordCount": 1,
"TotalPageCount": 1,
"CurrentPage": 1,
"BatchSize": 10,
"WorkItems": [ {
"WorkItemUId": "4336c111-7cd6-4938-835c-3ddc89961232",
"WorkItemId": "20740900",
"StackRank": "0",
"WorkItemTypeUId": "00020040-0200-0010-0040-000000000000",
"WorkItemExternalId": "79853"
}
I need to append the string "WorkItems[0].WorkItemExternalId" (being read from a excel file) and multiple other such nodes dynamically to "slurperresponse" to get the value of nodes rather than directly hard coding as slurperresponse.WorkItems[0].WorkItemExternalId..
Tried append and "+" operator but i get a compilation error. What other way can I do this?
slurperrsesponse is an object its not a string that's why the concatenation does not work
Json Slurper creates an object out of the input string. This object is dynamic by nature, you can access it, you can add fields to it or alter the existing fields. Contatenation won't work here.
Here is an example:
import groovy.json.*
​def text = '{"total" : 2, "students" : [{"name": "John", "age" : 20}, {"name": "Alice", "age" : 21}] }'
def json = new JsonSlurper().parseText(text)​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​
json.total = 3 // alter the value of the existing field
json.city = 'LA' // add a totally new field
json.students[0].age++ // change the field in a list
println json​
This yields the output:
[total:3, students:[[name:John, age:21], [name:Alice, age:21]], city:LA]
Now if I've got you right you want to add a new student dynamically and the input is a text that you've read from Excel. So here is the example:
json.students << new JsonSlurper().parseText('{"name" : "Tom", "age" : 25}')
// now there are 3 students in the list
Update
Its also possible to get the values without 'hardcoding' the property name:
// option 1
println json.city // prints 'LA'
// option 2
println json.get('city') // prints 'LA' but here 'city' can be a variable
// option 3
println json['city'] // the same as option 2

Is there a way to randomize the jsonPath array number in get[] myJson?

I have a list of values that I can use for the title field in my json request. I would like to store a function in the common.feature file which randomizes the title value when a scenario is executed.
I have attempted using the random number function provided on the commonly needed utilities tab on the readme. I have generated a random number successfully, the next step would be using that randomly gernerated number within the jsonpath line in order to retrieve a value from my data list which is in json.
* def myJson =
"""
{
"title" : {
"type" : "string",
"enum" : [
"MR",
"MRS",
"MS",
"MISS"
[...]
]
}
}
"""
* def randomNumber = random(3)
* def title = get[0] myJson.title.enum
* print title```
The code above works but I would like to randomize the number within the get[0]. How is this possible in Karate?
I'm not sure of what you want, but can't you just replace 0 by randomNumber in get[randomNumber] myJson.title.enum ?

Azure tables unable to store flattened JSON

I am using the npm flat package, and arrays/objects are flattened, but object/array keys are surrounded by '' , like in 'task_status.0.data' using the object below.
These specific fields do not get stored into AzureTables - other fields go through, but these are silently ignored. How would I fix this?
var obj1 = {
"studentId": "abc",
"task_status": [
{
"status":"Current",
"date":516760078
},
{
"status":"Late",
"date":1516414446
}
],
"student_plan": "n"
}
Here is how I am using it - simplified code example: Again, it successfully gets written to the table, but does not write the properties that were flattened (see further below):
var flatten = require('flat')
newObj1 = flatten(obj1);
var entGen = azure.TableUtilities.entityGenerator;
newObj1.PartitionKey = entGen.String(uniqueIDFromMyDB);
newObj1.RowKey = entGen.String(uniqueStudentId);
tableService.insertEntity(myTableName, newObj1, myCallbackFunc);
In the above example, the flattened object would look like:
var obj1 = {
studentId: "abc",
'task_status.0.status': 'Current',
'task_status.0.date': 516760078,
'task_status.1.status': 'Late',
'task_status.1.date': 516760078,
student_plan: "n"
}
Then I would add PartitionKey and RowKey.
all the task_status fields would silently fail to be inserted.
EDIT: This does not have anything to do with the actual flattening process - I just checked a perfectly good JSON object, with keys that had 'x.y.z' in it, i.e. AzureTables doesn't seem to accept these column names....which almost completely destroys the value proposition of storing schema-less data, without significant rework.
. in column name is not supported. You can use a custom delimiter to flatten your objects instead.
For example:
newObj1 = flatten(obj1, {delimiter: '__'});