Postgres replace an array inside a JSONB field - json

I have a table where the data field has JSONB type and among many other data I have a notes key inside the data json value where I store an array of notes.
Each note has (at least) two fields: title and content.
Sometimes I have to replace the whole list of notes with a different list, but not affecting any other fields inside my json record.
I tried something like this:
UPDATE mytable
SET data = jsonb_set("data", '{notes}', '[{ "title": "foo1" "content": "bar"'}, { "title": "foo2" "content": "bar2"}]', true)
WHERE id = ?
And I get an exception (through a js wrapper)
error: invalid input syntax for type json
How should I correctly use the jsonb_set function?

You have a stray single quote and missing commas in your JSON payload
Instead of
[{ "title": "foo1" "content": "bar"'}, { "title": "foo2" "content": "bar2"}]
^ ^ ^
it should rather look
[{ "title": "foo1", "content": "bar"}, { "title": "foo2", "content": "bar2"}]

Related

Valid string sequences in JSON Schema

Can anyone advise how to code up a JSON Schema document to describe a string that can be one of three possible sequences? Say a string "fruit" can be only the following: "apple", "bananna" or "coconut".
I was thinking it might be possible to use regex but not sure how to indicate the regex constraint in JSON Schema.
https://json-schema.org/draft/2020-12/json-schema-core.html#rfc.section.6.1
Here is what I have so far:
{
"$id": "https://example.com/person.schema.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "TestSession",
"type": "object",
"properties": {
"fruit": {
"type": "string",
"description": "only three legal possibilities: apple, banana, or coconut"
}
}
You need to use the enum keyword for this.
The value of this keyword MUST be an array. This array SHOULD have
at least one element. Elements in the array SHOULD be unique.
An instance validates successfully against this keyword if its
value is equal to one of the elements in this keyword's array
value.
Elements in the array might be of any type, including null.
https://datatracker.ietf.org/doc/html/draft-bhutton-json-schema-validation-00#section-6.1.2
For example "enum": [ "apple", "bananna", "coconut" ].

Need to relationalise a nested JSON string using Pyspark

I am new to Pyspark and need guidance to perform the following task.
A sample data in the form of a JSON string has been given,
{
"id": "1234",
"location": "znd",
"contact": "{\"phone\": [{\"number\":\"12345\",\"code\":\"111\",\"altno\":\"No\"},{\"number\":\"55656\",\"code\":\"222\",\"altno\":\"Yes\"}]}"
}
This needs to be rationalized as follows, as seen below one row of input will get translated to 2 rows.
{id: "1234", "location": "znd","number": "12345", "code": "111","altno":"No"}
{id: "1234", "location": "znd","number": "55656", "code": "222","altno":"No"}
I have tried to use the explode function but as this is a JSON string, explode does not work on it.
I have read the data into a DF and tried to enforce a struct type to later use explode, but that does not work either.

How to edit a json dictionary in Robot Framework

I am currently implementing some test automation that uses a json POST to a REST API to initialize the test data in the SUT. Most of the fields I don't have an issue editing using information I found in another thread: Json handling in ROBOT
However, one of the sets of information I am editing is a dictionary of meta data.
{
"title": "Test Auotmation Post 2018-03-06T16:12:02Z",
"content": "dummy text",
"excerpt": "Post made by automation for testing purposes.",
"name": "QA User",
"status": "publish",
"date": "2018-03-06T16:12:02Z",
"primary_section": "Entertainment",
"taxonomy": {
"section": [
"Entertainment"
]
},
"coauthors": [
{
"name": "QA User - CoAuthor",
"meta": {
"Title": "QA Engineer",
"Organization": "That One Place"
}
}
],
"post_meta": [
{
"key": "credit",
"value": "QA Engineer"
},
{
"key": "pub_date",
"value": "2018-03-06T16:12:02Z"
},
{
"key": "last_update",
"value": "2018-03-06T16:12:02Z"
},
{
"key": "source",
"value": "wordpress"
}
]
}
Is it possible to use the Set to Dictionary Keyword on a dictionary inside a dictionary? I would like to be able to edit the value of the pub_date and last_update inside of post_meta, specifically.
The most straightforward way would be to use the Evaluate keyword, and set the sub-dict value in it. Presuming you are working with a dictionary that's called ${value}:
Evaluate $value['post_meta'][1]['pub_date'] = 'your new value here'
I won't get into how to find the index of the post_meta list that has the 'key' with value 'pub_date', as that's not part of your question.
Is it possible to use the Set to Dictionary Keyword on a dictionary inside a dictionary?
Yes, it's possible.
However, because post_meta is a list rather than a dictionary, you will have to write some code to iterate over all of the values of post_meta until you find one with the key you want to update.
You could do this in python quite simply. You could also write a keyword in robot to do that for you. Here's an example:
*** Keywords ***
Set list element by key
[Arguments] ${data} ${target_key} ${new_value}
:FOR ${item} IN #{data}
\ run keyword if '''${item['key']}''' == '''${target_key}'''
\ ... set to dictionary ${item} value=${new_value}
[return] ${data}
Assuming you have a variable named ${data} contains the original JSON data as a string, you could call this keyword like the following:
${JSON}= evaluate json.loads('''${data}''') json
set list element by key ${JSON['post_meta']} pub_date yesterday
set list element by key ${JSON['post_meta']} last_update today
You will then have a python object in ${JSON} with the modified values.

How to validate number of properties in JSON schema

I am trying to create a schema for a piece of JSON and have slimmed down an example of what I am trying to achieve.
I have the following JSON schema:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "Set name",
"description": "The exmaple schema",
"type": "object",
"properties": {
"name": {
"type": "string"
}
},
"additionalProperties": false
}
The following JSON is classed as valid when compared to the schema:
{
"name": "W",
"name": "W"
}
I know that there should be a warning about the two fields having the same name, but is there a way to force the validation to fail if the above is submitted? I want it to only validate when there is only one occurrence of the field 'name'
This is outside of the responsibility of JSON Schema. JSON Schema is built on top of JSON. In JSON, the behavior of duplicate properties in an object is undefined. If you want to get warning about this you should run it through a separate validation step to ensure valid JSON before passing it to a JSON Schema validator.
There is a maxProperties constraint that can limit total number of properties in an object.
Though having data with duplicated properties is a tricky case as many json decoding implementions would ignore duplicate.
So your JSON schema validation lib would not even know duplicate existed.

Reading Inconsistent Nested JSON in Athena

In Athena, I am reading some nested JSON files into a table. The field that actually contains the nested JSON has an inconsistent number of fields within it across the different files in the raw data.
Sometimes the data looks something like this:
{
"id": "9f1e07b4",
"date": "05/20/2018 02:30:53.110 AM",
"data": {
"a": "asd",
"b": "adf",
"body": {
"sid": {
"uif": "yes",
"sidd": "no",
"state": "idle"
}
},
"category": "scene"
}
}
Other times the data looks something like this:
{
"id": "9f1e07b4",
"date": "05/20/2018 02:30:45.436 AM",
"data": {
"a": "event",
"b": "state",
"body": {
"persona": {
"one": {
"movement": "idle"
}
}
},
"category": "scene"
}
}
Other times the "body" field contains both the "sid" struct and the "persona" struct.
As you can see the fields given within "body" are not always consistent. I tried to add all of the possible fields and their structures within my CREATE EXTERNAL TABLE query. However, the "data" column that contains the "body" field still does not fill and remains blank when I "preview table" in Athena.
In the CREATE TABLE DDL, is there a way to indicate that I want to fill all of columns that aren't present in the nested JSON of each file with null values?
Furthermore, the 'names' given to the fields in the query do not have to correspond to the key values in the raw JSON. It seems Athena is simply reading the structure and nothing else. Is there a way to indicate which JSON key corresponds to which Athena field name directly? So that if some fields are missing from the "body" of one file, Athena can know which one is missing and fill it in as null?