JSON Schema validation in PostgreSQL? - json

I can't find any information about JSON schema validation in PostgreSQL, is there any way to implement JSON Schema validation on PostgreSQL JSON data type?

There is another PostgreSQL extension that implements json validation. The usage is almost the same as "Postgres-JSON-schema"
CREATE TABLE example (id serial PRIMARY KEY, data jsonb);
-- do is_jsonb_valid instead of validate_json_schema
ALTER TABLE example ADD CONSTRAINT data_is_valid CHECK (is_jsonb_valid('{"type": "object"}', data));
INSERT INTO example (data) VALUES ('{}');
-- INSERT 0 1
INSERT INTO example (data) VALUES ('1');
-- ERROR: new row for relation "example" violates check constraint "data_is_valid"
-- DETAIL: Failing row contains (2, 1).
I've done some benchmarking validating tweets and it is 20x faster than "Postgres-JSON-schema", mostly because it is written in C instead of SQL.
Disclaimer, I've written this extension.

There is a PostgreSQL extension that implements JSON Schema validation in PL/PgSQL.
It is used like this (taken from the project README file):
CREATE TABLE example (id serial PRIMARY KEY, data jsonb);
ALTER TABLE example ADD CONSTRAINT data_is_valid CHECK (validate_json_schema('{"type": "object"}', data));
INSERT INTO example (data) VALUES ('{}');
-- INSERT 0 1
INSERT INTO example (data) VALUES ('1');
-- ERROR: new row for relation "example" violates check constraint "data_is_valid"
-- DETAIL: Failing row contains (2, 1).

What you need is something to translate JSON Schema constraints into PostgreSQL ones, e.g.:
{
"properties": {
"age": {"minimum": 21}
},
"required": ["age"]
}
to:
SELECT FROM ...
WHERE (elem->>'age' >= 21)
I'm not aware of any existing tools. I know of something similar for MySQL which might be useful for writing your own, but nothing for using the JSON type in PostgreSQL.

Related

Using an unstructured JSON column for Clickhouse v22.3

I'm using the new JSON column for Clickhouse, which was added in version 22.3.
There is a great blog post here on the Clickhouse website about it - https://clickhouse.com/blog/clickhouse-newsletter-april-2022-json-json-json/
I'm trying to add unstructured JSON, where the document type isn't known until it's inserted. I've been using Postgres with JSONB and Snowflake with VARIANT for this and it's been working great.
With Clickhouse (v22.4.5.9, current as of 2022-05-14), here is what I'm doing:
-- We need to enable this flag to use JSON, as it's currently (as of 2022-05-14) experimental.
set allow_experimental_object_type = 1;
-- Create an example table for our testing, we can use the Memory engine as it'll be tiny.
create table example_json (
json_data json
)
engine = Memory();
-- Now let's insert two different JSON documents, usually this would be batched, but for the sake of this
-- example, let's just use two inserts.
-- insert into example_json(json)
INSERT INTO example_json VALUES ('{"animal": "dog"}');
-- Returns ('dog'), great.
select * from example_json;
-- Returns "dog", even cooler.
select json_data.animal from example_json;
-- Now we want to change around the values
INSERT INTO example_json VALUES ('{"name": "example", "animal": {"breed": "cat"}}');
This throws the following error:
Code: 15. DB::Exception: Data in Object has ambiguous paths: 'animal.breed' and 'animal'. (DUPLICATE_COLUMN) (version 22.4.5.9 (official build))
I think that under the hood Clickhouse is converting the keys to column types, but won't change the type if a conflicting type is then created?
Is there a way to insert JSON like this to Clickhouse?
It is precisely like you described. Clickhouse will try to infer the types of all the columns on INSERT - JSON is internally represented as a tuple.
You can check what type is currently inferred by running:
SET describe_extend_object_types=1;
DESCRIBE example_json;
You will see that this table already has a column called animal hence CH will report it as a duplicate:
DESCRIBE TABLE example_json
SETTINGS describe_extend_object_types = 1
Query id: 884a9a85-d883-45b9-8c90-f957a39a995e
┌─name──────┬─type─────────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ json_data │ Tuple(animal String) │ │ │ │ │ │
└───────────┴──────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
1 row in set. Elapsed: 0.001 sec.
You can find more details about this here: https://clickhouse.com/docs/en/guides/developer/working-with-json/json-semi-structured#handling-data-changes
Can you try parse_json function within Snowflake to insert values into the JSON table.
https://docs.snowflake.com/en/sql-reference/functions/parse_json.html#examples
Basically try this DML
INSERT INTO example_json select 1, parse_json ($$ { "name": "example", "animal": {"breed": "cat" } $$);

MySql JSON Schema

I have a table in MySql somewhat like this:
create table json_docs (
id int auto_increment primary_key,
doc json not null
)
I also have a json schema that this doc column obeys.
{
"$id": "https://my.org/namespace/doc.schema.json",
"type": "object",
"properties": {...}
}
Is there a way to enforce that schema in MySql during an insert or update?
insert into json_docs (doc) values ('{...}')
I am also using PHP Laravel. If there is a way to do this in the Model layer of Laravel I would also be interested.
Thanks in advance for your help!
MySQL does have json validation support.
https://dev.mysql.com/doc/refman/8.0/en/json-validation-functions.html
MySQL does not have any feature built-in to validate JSON documents against a schema.
Update: This feature now exists in 8.0.17, but prior to that version, the following answer still applies:
You would have to validate your document in your app, independently from inserting it into a database.
See https://github.com/justinrainbow/json-schema for an open-source JSON validator in PHP.

How to create table without schema in BigQuery by API?

Simply speaking I would create table with given name providing only data.
I have some JUnit's with sample data (jsons)
I have to provide schema for above files to create tables for them
I suppose that don't need provide above schemas.
Why? Because in BigQuery console I can create table from query (even such simple like: select 1, 'test') or I can upload json to create table with schema autodetection => probably could also do it programatically
I saw https://chartio.com/resources/tutorials/how-to-create-a-table-from-a-query-in-google-bigquery/#using-the-api and know that could parse jsons with data to queries and use Jobs.insert API to run them but it's over engineered and has some other disadvanteges e.g. boilerplate code.
After some research I found possibly simpler way of creating table on fly, but it doesn't work for me, code below:
Insert insert = bigquery.jobs().insert(projectId,
new Job().setConfiguration(
new JobConfiguration().setLoad(
new JobConfigurationLoad()
.setSourceFormat("NEWLINE_DELIMITED_JSON")
.setDestinationTable(
new TableReference()
.setProjectId(projectId)
.setDatasetId(dataSetId)
.setTableId(tableId)
)
.setCreateDisposition("CREATE_IF_NEEDED")
.setWriteDisposition(writeDisposition)
.setSourceUris(Collections.singletonList(sourceUri))
.setAutodetect(true)
)
));
Job myInsertJob = insert.execute();
JSON file which is used as a source data is pointed by sourceUri, looks like:
[
{
"stringField1": "value1",
"numberField2": "123456789"
}
]
Even if I used setCreateDisposition("CREATE_IF_NEEDED") I still receive error: "Not found: Table ..."
Is there any other method in API or better approach than above to exclude schema?
The code in your question is perfectly fine, and it does create table if it doesn't exist. However, it fails when you use partition id in place of table id, i.e. when destination table id is "table$20170323" which is what you used in your job. In order to write to partition, you will have to create table first.

How to do JSON data validation with stored procedures in Postgresql

I have one SQL table in postgress database having multiple JSON type columns.
We want to do some kind of validation on those JSON coulmns before inserting the data.
Our target is to have validation using stored procedure, which call noSQL queries and respond error in case of any validation failure on that JSON column.
Example -
lets say we have 5 to 6 JSON columns in User table,
JSON columns like -
Personal Information, Accounts, OwnerDetails etc, and these coulmns having complex JSON like list of JSON data or JSON within JSON.
Now we want to validate each column one by one and in case of error, return unique error code.
When using the JSON type, we can easily define constraints on the JSON fields. Here is a basic demonstration:
CREATE FUNCTION is_valid_description(descr text) RETURNS BOOLEAN AS
'SELECT length($1) > 10;'
LANGUAGE sql;
CREATE TABLE wizards (
id SERIAL,
data JSON,
CONSTRAINT validate_name CHECK ((data->>'name') IS NOT NULL AND length(data->>'name') > 0),
CONSTRAINT validate_description CHECK ((data->>'description') IS NOT NULL AND (is_valid_description(data->>'description')))
);
We check for constraints on two JSON fields of the data column. In one case, we use a stored procedure written in SQL, but we could write more complex functions with PL/pgSQL, or in Perl or Python.
We may also want to enforce uniqueness on one of the fields:
CREATE UNIQUE INDEX ui_wizards_name ON wizards((data->>'name'));
Validation and uniqueness constraints are then enforced when inserting or updating rows.
-- violates check constraint "validate_description"
INSERT INTO wizards(data) VALUES('{
"name": "Kirikou",
"description": "A witch"
}');
-- passes
INSERT INTO wizards(data) VALUES('{
"name": "Kirikou",
"description": "A witch of African descent"
}');
-- violates unique constraint "ui_wizards_name"
INSERT INTO wizards(data) VALUES('{
"name": "Kirikou",
"description": "The same witch of African descent"
}');
Code to play with on http://sqlfiddle.com/#!15/23974/2.

SQL Alchemy and generating ALTER TABLE statements

I want to programatically generate ALTER TABLE statements in SQL Alchemy to add a new column to a table. The column to be added should take its definition from an existing mapped class.
So, given an SQL Alchemy Column instance, can I generate the SQL schema definition(s) I would need for ALTER TABLE ... ADD COLUMN ... and CREATE INDEX ...?
I've played at a Python prompt and been able to see a human-readable description of the data I'm after:
>>> DBChain.__table__.c.rName
Column('rName', String(length=40, convert_unicode=False, assert_unicode=None, unicode_error=None, _warn_on_bytestring=False), table=<Chain>)
When I call engine.create_all() the debug log includes the SQL statements I'm looking to generate:
CREATE TABLE "Chain" (
...
"rName" VARCHAR(40),
...
)
CREATE INDEX "ix_Chain_rName" ON "Chain" ("rName")
I've heard of sqlalchemy-migrate, but that seems to be built around static changes and I'm looking to dynamically generate schema-changes.
(I'm not interested in defending this design, I'm just looking for a dialect-portable way to add a column to an existing table.)
After tracing engine.create_all() with a debugger I've discovered a possible answer:
>>> engine.dialect.ddl_compiler(
... engine.dialect,
... DBChain.__table__.c.rName ) \
... .get_column_specification(
... DBChain.__table__.c.rName )
'"rName" VARCHAR(40)'
The index can be created with:
sColumnElement = DBChain.__table__.c.rName
if sColumnElement.index:
sIndex = sa.schema.Index(
"ix_%s_%s" % (rTableName, sColumnElement.name),
sColumnElement,
unique=sColumnElement.unique)
sIndex.create(engine)