Regex for json schema pattern - json

I would like to validate a human input in json schema with pattern, but I cannot figure out the regex for it.
Valid formats are:
"Joe":"day1","Mitch":"day2"
or
"Joe":"day1"
So any number of iterations of "somebody":"someday" separated with , (comma).
Invalid formats are:
"Joe":"day1","Mitch":"day2",
Or
"Joe":"day1";"Mitch":"day2"
Example Json schema (pattern is not working here):
{
"title": "Test",
"type": "object",
"properties": {
"meeting_date": {
"type": "string",
"description": "Give me a name and a day with the following format: \"Joe\":\"day1\" ",
"default": "",
"pattern": "(\"\\w*\")(?::)(\"\\w*\")"
}
}
}

try this solution https://regex101.com/r/vW8m6K/2/
^("\w+":"\w+",)*"\w+":"\w+"$
But it fails on extra spaces, for it test:
^("\w+"\s*:\s*"\w+"\s*(?:,\s*|$))+$

Your pattern acutally almost works. You just have to remove the backslashes in front of your quotation marks.
("\w*")(?::)("\w*")
You can test your Regex on https://regex101.com/ (or some simular website).

Related

Storing JSON blob in an Avro Field

I have inherited project where an avro file that is being consumed by Snowflake. The schema of the avro is as follows:
{
"name": "TableName",
"namespace": "sqlserver",
"type": "record",
"fields": [
{
"name": "hAccount",
"type": "string"
},
{
"name": "hTableName",
"type": "string"
},
{
"name": "hRawJSON",
"type": "string"
}
]
}
The hRawJSON is a blob of JSON itself. The previous dev put this as a type of string, and this is where I believe the problem lies.
The application takes a JSON object (the JSON is varible so I never know the contents or what it contains) and populates the hRawJSON field in the Avro record. But it contains the escape characters for the double quotes in the string:
hAccount:"H11122"
hTableName:"Departments"
hRawJSON:"{\"DepartmentID\":1,\"ModelID\":0,\"Description\":\"P Medicines\",\"Margin\":\"3.300000000000000e+001\",\"UCSVATRateID\":0,\"References\":719,\"HeadOfficeID\":1,\"DividendID\":0}"
As a result the JSON blob is staged into Snowflake as a VARIANT field but still retains the escape characters:
Snowflake image
This means when querying the data in the JSON I constantly have to use this:
PARSE_JSON(RAW_FILE:hRawJSON):DepartmentID
I can't help feeling that the field type of string in the Avro file is causing the issue and that a different type should be used. I've tried Record, but without fields it's unuable. Doc also not working.
The other alternative is that this behavior is correct and when moving the hRawJSON from staging into "proper" tables I should use something like:
INSERT INTO DATA.PUBLIC.DEPARTMENTS
SELECT
RAW_FILE:hAccount::VARCHAR(4) as Account,
PARSE_JSON(RAW_FILE:hRawJSON) as JsonRaw
FROM DATA.STAGING.AVRO_RAW WHERE RAW_FILE:hTableName::STRING = 'Department';
So if this should be the correct approach and I'm over thinking this I'd appreciate guidance.

Valid string sequences in JSON Schema

Can anyone advise how to code up a JSON Schema document to describe a string that can be one of three possible sequences? Say a string "fruit" can be only the following: "apple", "bananna" or "coconut".
I was thinking it might be possible to use regex but not sure how to indicate the regex constraint in JSON Schema.
https://json-schema.org/draft/2020-12/json-schema-core.html#rfc.section.6.1
Here is what I have so far:
{
"$id": "https://example.com/person.schema.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "TestSession",
"type": "object",
"properties": {
"fruit": {
"type": "string",
"description": "only three legal possibilities: apple, banana, or coconut"
}
}
You need to use the enum keyword for this.
The value of this keyword MUST be an array. This array SHOULD have
at least one element. Elements in the array SHOULD be unique.
An instance validates successfully against this keyword if its
value is equal to one of the elements in this keyword's array
value.
Elements in the array might be of any type, including null.
https://datatracker.ietf.org/doc/html/draft-bhutton-json-schema-validation-00#section-6.1.2
For example "enum": [ "apple", "bananna", "coconut" ].

Creating a type definition for a property named "type" using JSON schema

I'm trying to create a JSON schema for an existing JSON file that looks something like this:
{
"variable": {
"name": "age",
"type": "integer"
}
}
In the schema, I want to ensure the type property has the value string or integer:
{
"variable": {
"name": "string",
"type": {
"type": "string",
"enum": ["string", "integer"]
}
}
}
Unfortunately it blows up with message: ValidationError {is not any of [subschema 0]....
I've read that there are "no reserved words" in JSON schema, so I assume a type of type is valid, assuming I declare it correctly?
The accepted answer from jruizaranguren doesn't actually answer the question.
The problem is that given JSON (not JSON schema, JSON data) that has a field named "type", it's hard to write a JSON schema that doesn't choke.
Imagine that you have an existing JSON data feed (data, not schema) that contains:
"ids": [ { "type": "SSN", "value": "123-45-6789" },
{ "type": "pay", "value": "8675309" } ]
What I've found in trying to work through the same problem is that instead of putting
"properties": {
"type": { <======= validation chokes on this
"type": "string"
}
you can put
"patternProperties": {
"^type$": {
"type": "string"
}
but I'm still working through how to mark it as a required field. It may not be possible.
I think, based on looking at the "schema" in the original question, that JSON schemas have evolved quite a lot since then - but this is still a problem. There may be a better solution.
According to the specification, in the Valid typessection for type:
The value of this keyword MUST be either a string or an array. If it is an array, elements of the array MUST be strings and MUST be unique.
String values MUST be one of the seven primitive types defined by the core specification.
Later, in Conditions for successful validation:
An instance matches successfully if its primitive type is one of the types defined by keyword. Recall: "number" includes "integer".
In your case:
{
"variable": {
"name": "string",
"type": ["string", "integer"]
}
}

json regex error

I wrote a json regex for url redirection validation
"redirect_uri": {
"type": "string",
"pattern": "^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,8})([\/\w \.-]*)*\/?$",
"description":"Application redirect_uri"
},
I am getting this error
illegal backslash escape sequence in string, at character offset 1223 (before "\da-z\.-]+)\.([a-...")
Where I am making mistakes . Everything seems to be fine to me
I'm not sure if it's satisfy your pattern requirement but this is a valid JSON:
"redirect_uri": {
"type": "string",
"pattern": "^(https?:\/\/)?([\\da-z\\.-]+)\\.([a-z\\.]{2,8})([\\/\\w \\.-]*)*\\/?$",
"description": "Application redirect_uri"
}
I just add some escape \ where needed.
I recommend to use some online JSON validator site like: http://jsonlint.com/

Can you put comments in Avro JSON schema files?

I'm writing my first Avro schema, which uses JSON as the schema language. I know you cannot put comments into plain JSON, but I'm wondering if the Avro tool allows comments. E.g. Perhaps it strips them (like a preprocessor) before parsing the JSON.
Edit: I'm using the C++ Avro toolchain
Yes, but it is limited. In the schema, Avro data types 'record', 'enum', and 'fixed' allow for a 'doc' field that contains an arbitrary documentation string. For example:
{"type": "record", "name": "test.Weather",
"doc": "A weather reading.",
"fields": [
{"name": "station", "type": "string", "order": "ignore"},
{"name": "time", "type": "long"},
{"name": "temp", "type": "int"}
]
}
From the official Avro spec:
doc: a JSON string providing documentation to the user of this schema (optional).
https://avro.apache.org/docs/current/spec.html#schema_record
An example:
https://github.com/apache/avro/blob/33d495840c896b693b7f37b5ec786ac1acacd3b4/share/test/schemas/weather.avsc#L2
Yes, you can use C comments in an Avro JSON schema : /* something */ or // something Avro tools ignores these expressions during the parsing.
EDIT: It only works with the Java API.
According to the current (1.9.2) Avro specification it's allowed to put in extra attributes, that are not defined, as metadata:
This allows you add comments like this:
{
"type": "record",
"name": "test",
"comment": "This is a comment",
"//": "This is also a comment",
"TODO": "As per this comment we should remember to fix this schema" ,
"fields" : [
{
"name": "a", "type": "long"
},
{
"name": "b", "type": "string"
}
]
}
No, it can't in the C++ nor the C# version (as of 1.7.5). If you look at the code they just shove the JSON into the JSON parser without any comment preprocessing - bizarre programming style. Documentation and language support appears to be pretty sloppy...