How to do JSON data validation with stored procedures in Postgresql - json

I have one SQL table in postgress database having multiple JSON type columns.
We want to do some kind of validation on those JSON coulmns before inserting the data.
Our target is to have validation using stored procedure, which call noSQL queries and respond error in case of any validation failure on that JSON column.
Example -
lets say we have 5 to 6 JSON columns in User table,
JSON columns like -
Personal Information, Accounts, OwnerDetails etc, and these coulmns having complex JSON like list of JSON data or JSON within JSON.
Now we want to validate each column one by one and in case of error, return unique error code.

When using the JSON type, we can easily define constraints on the JSON fields. Here is a basic demonstration:
CREATE FUNCTION is_valid_description(descr text) RETURNS BOOLEAN AS
'SELECT length($1) > 10;'
LANGUAGE sql;
CREATE TABLE wizards (
id SERIAL,
data JSON,
CONSTRAINT validate_name CHECK ((data->>'name') IS NOT NULL AND length(data->>'name') > 0),
CONSTRAINT validate_description CHECK ((data->>'description') IS NOT NULL AND (is_valid_description(data->>'description')))
);
We check for constraints on two JSON fields of the data column. In one case, we use a stored procedure written in SQL, but we could write more complex functions with PL/pgSQL, or in Perl or Python.
We may also want to enforce uniqueness on one of the fields:
CREATE UNIQUE INDEX ui_wizards_name ON wizards((data->>'name'));
Validation and uniqueness constraints are then enforced when inserting or updating rows.
-- violates check constraint "validate_description"
INSERT INTO wizards(data) VALUES('{
"name": "Kirikou",
"description": "A witch"
}');
-- passes
INSERT INTO wizards(data) VALUES('{
"name": "Kirikou",
"description": "A witch of African descent"
}');
-- violates unique constraint "ui_wizards_name"
INSERT INTO wizards(data) VALUES('{
"name": "Kirikou",
"description": "The same witch of African descent"
}');
Code to play with on http://sqlfiddle.com/#!15/23974/2.

Related

How to reuse JSON arguments within PostgreSQL stored procedure

I am using a stored procedure to INSERT into and UPDATE a number of tables. Some of the data is derived from a JSON parameter.
Although I have successfully used json_to_recordset() to extract named data from the JSON parameter, I cannot figure how to use it in an UPDATE statement. Also, I need to use some items of data from the JSON parameter a number of times.
Q: Is there a way to use json_to_recordset() to extract named data to a temporary table to allow me to reuse the data items throughout my stored procedure? Maybe I should SELECT INTO variables within the stored procedure?
Q: Failing that can anyone please provide a simple example of how to update a table using data returned from json_to_recordset(). I must also include data not from the JSON parameter such as now()::timestamp(0).
This is how I have used json_to_recordset() so far:
INSERT INTO myRealTable (
rec_timestamp,
rec_data1,
rec_data2
)
SELECT
now()::timestamp(0),
x.data1,
x.data2
FROM json_to_recordset(json_parameter) x
(
json_data1 int,
json_data2 boolean
);
Thank you.

MySql JSON Schema

I have a table in MySql somewhat like this:
create table json_docs (
id int auto_increment primary_key,
doc json not null
)
I also have a json schema that this doc column obeys.
{
"$id": "https://my.org/namespace/doc.schema.json",
"type": "object",
"properties": {...}
}
Is there a way to enforce that schema in MySql during an insert or update?
insert into json_docs (doc) values ('{...}')
I am also using PHP Laravel. If there is a way to do this in the Model layer of Laravel I would also be interested.
Thanks in advance for your help!
MySQL does have json validation support.
https://dev.mysql.com/doc/refman/8.0/en/json-validation-functions.html
MySQL does not have any feature built-in to validate JSON documents against a schema.
Update: This feature now exists in 8.0.17, but prior to that version, the following answer still applies:
You would have to validate your document in your app, independently from inserting it into a database.
See https://github.com/justinrainbow/json-schema for an open-source JSON validator in PHP.

How to create table without schema in BigQuery by API?

Simply speaking I would create table with given name providing only data.
I have some JUnit's with sample data (jsons)
I have to provide schema for above files to create tables for them
I suppose that don't need provide above schemas.
Why? Because in BigQuery console I can create table from query (even such simple like: select 1, 'test') or I can upload json to create table with schema autodetection => probably could also do it programatically
I saw https://chartio.com/resources/tutorials/how-to-create-a-table-from-a-query-in-google-bigquery/#using-the-api and know that could parse jsons with data to queries and use Jobs.insert API to run them but it's over engineered and has some other disadvanteges e.g. boilerplate code.
After some research I found possibly simpler way of creating table on fly, but it doesn't work for me, code below:
Insert insert = bigquery.jobs().insert(projectId,
new Job().setConfiguration(
new JobConfiguration().setLoad(
new JobConfigurationLoad()
.setSourceFormat("NEWLINE_DELIMITED_JSON")
.setDestinationTable(
new TableReference()
.setProjectId(projectId)
.setDatasetId(dataSetId)
.setTableId(tableId)
)
.setCreateDisposition("CREATE_IF_NEEDED")
.setWriteDisposition(writeDisposition)
.setSourceUris(Collections.singletonList(sourceUri))
.setAutodetect(true)
)
));
Job myInsertJob = insert.execute();
JSON file which is used as a source data is pointed by sourceUri, looks like:
[
{
"stringField1": "value1",
"numberField2": "123456789"
}
]
Even if I used setCreateDisposition("CREATE_IF_NEEDED") I still receive error: "Not found: Table ..."
Is there any other method in API or better approach than above to exclude schema?
The code in your question is perfectly fine, and it does create table if it doesn't exist. However, it fails when you use partition id in place of table id, i.e. when destination table id is "table$20170323" which is what you used in your job. In order to write to partition, you will have to create table first.

JSON Schema validation in PostgreSQL?

I can't find any information about JSON schema validation in PostgreSQL, is there any way to implement JSON Schema validation on PostgreSQL JSON data type?
There is another PostgreSQL extension that implements json validation. The usage is almost the same as "Postgres-JSON-schema"
CREATE TABLE example (id serial PRIMARY KEY, data jsonb);
-- do is_jsonb_valid instead of validate_json_schema
ALTER TABLE example ADD CONSTRAINT data_is_valid CHECK (is_jsonb_valid('{"type": "object"}', data));
INSERT INTO example (data) VALUES ('{}');
-- INSERT 0 1
INSERT INTO example (data) VALUES ('1');
-- ERROR: new row for relation "example" violates check constraint "data_is_valid"
-- DETAIL: Failing row contains (2, 1).
I've done some benchmarking validating tweets and it is 20x faster than "Postgres-JSON-schema", mostly because it is written in C instead of SQL.
Disclaimer, I've written this extension.
There is a PostgreSQL extension that implements JSON Schema validation in PL/PgSQL.
It is used like this (taken from the project README file):
CREATE TABLE example (id serial PRIMARY KEY, data jsonb);
ALTER TABLE example ADD CONSTRAINT data_is_valid CHECK (validate_json_schema('{"type": "object"}', data));
INSERT INTO example (data) VALUES ('{}');
-- INSERT 0 1
INSERT INTO example (data) VALUES ('1');
-- ERROR: new row for relation "example" violates check constraint "data_is_valid"
-- DETAIL: Failing row contains (2, 1).
What you need is something to translate JSON Schema constraints into PostgreSQL ones, e.g.:
{
"properties": {
"age": {"minimum": 21}
},
"required": ["age"]
}
to:
SELECT FROM ...
WHERE (elem->>'age' >= 21)
I'm not aware of any existing tools. I know of something similar for MySQL which might be useful for writing your own, but nothing for using the JSON type in PostgreSQL.

Delete entry in couchbase bucket using key in the form of regex

I have a requirement wherein I have to delete an entry from the couchbase bucket. I use the delete method of the CouchbaseCient from my java application to which I pass the key. But in one particular case I dont have the entire key name but a part of it. So I thought that there would be a method that takes a matcher but I could not find one. Following is the actual key that is stored in the bucket
123_xyz_havefun
and the part of the key that I have is xyz. I am not sure whether this can be done. Can anyone help.
The DELETE operation of the Couchbase doesn't support neither wildcards, nor regular expressions. So you have to get the list of keys somehow and pass it to the function. For example, you might use Couchbase Views or maintain your own list of keys via APPEND command. Like create the key xyz and append to its value all the matching keys during application lifetime with flushing this key after real delete request
Well, I think you can achieve delete using wildcard or regex like expression.
Above answers basically says,
- Query the data from the Couchbase
- Iterate over resultset
- and fire delete for each key of your interest.
However, I believe: Delete on server should be delete on server, rather than requiring three steps as above.
In this regards, I think old fashioned RDBMS were better all you need to do is fire SQL query like 'DELETE * from database where something like "match%"'.
Fortunately, there is something similar to SQL is available in CouchBase called N1QL (pronounced nickle). I am not aware about JavaScript (and other language syntax) but this is how I did it in python.
Query to be used: DELETE from b where META(b).id LIKE "%"
layer_name_prefix = cb_layer_key + "|" + "%"
query = ""
try:
query = N1QLQuery('DELETE from `test-feature` b where META(b).id LIKE $1', layer_name_prefix)
cb.n1ql_query(query).execute()
except CouchbaseError, e:
logger.exception(e)
To achieve the same thing: alternate query could be as below if you are storing 'type' and/or other meta data like 'parent_id'.
DELETE from where type='Feature' and parent_id=8;
But I prefer to use first version of the query as it operates on key, and I believe Couchbase must have some internal indexes to operate/query faster on key (and other metadata).
Although it is true you cannot iterate over documents with a regex, you could create a new view and have your map function only emit keys that match your regex.
An (obviously contrived and awful regex) example map function could be:
function(doc, meta) {
if (meta.id.match(/_xyz_/)) {
emit(meta.id, null);
}
}
An alternative idea would be to extract that portion of the key from each document and then emit that. That would allow you to use the same index to match different documents by that particular key form.
function(doc, meta) {
var match = meta.id.match(/^.*_(...)_.*$/);
if (match) {
emit(match[1], null);
}
}
In your case, this would emit the key xyz (or the corresponding component from each key) for each document. You could then just use startkey and endkey to limit based on your criteria.
Lastly, there are a ton of options from the information retrieval research space for building text indexes that could apply here. I'll refer you to this doc on permuterm indexes to get you started.