Can I import table in a JSON Schema validation? - json

I am writing a JSON schema validation. I have an ID field whose values are imported from a table in SQL Server. These values are large and are frequently updated, so is there a way to dynamically connect to this table in the server and validate the JSON? Below is an example code of my schema:
{
"type": "object",
"required": ["employees"],
"properties": {
"employees": {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": { "type": "integer", enum = [134,2123,3213,444,5525,6234,7532,825,9342]}
}
}
}
}
In place of 'enum' I want to connect to a table so the ID values are updated when the table is updated.

As Greg said, there is nothing in JSON Schema which allows you to do this.
Some implementations have created their own extensions to allow external sources. Many implementations allow custom keywords. You would have to check your documentation.
You should consider the cost of querying the database at the same time as checking structural correctness. It may be benficial to do ID checking which hits your database after you've confirmed the data is of the correct format and structure.

Related

JSON Schema object properties defined by enum

I'm attempting to reuse an enum in my JSON Schema to define the properties for an object.
I was wondering if the following is correct.
JSON Schema
{
"type": "object",
"propertyNames": {
"enum": ["Foo","Bar"]
},
"patternProperties": {
".*": {
"type": "number"
}
}
}
JSON Data
{
"Foo": 123,
"Bar": 456
}
The reason I ask is that I get inconsistent results from JSON Schema validation libraries. Some indicate the JSON validates, while others indicate the JSON is invalid.
p.s. if anyone is wondering "why" I'm trying to define the properties with an enum, it is because the enum is shared in various parts of my json schema. In some cases it is a constraint on a string, but I need the identical set of possible values both on those string properties and also on the object properties. As an enum I can maintain the set of possible values in one place.
Yes, that's a valid JSON Schema. You could also express it like this:
{
"type": "object",
"propertyNames": {
"enum": ["Foo","Bar"]
},
"additionalProperties": {
"type": "number"
}
}
It says "all property names must conform to this schema: (one of these values listed in the enum); also, all property values must conform to this schema: (must be numeric type)."
What errors do you get from the implementations that report this as invalid? Those implementations have a bug; would you consider reporting it to them?

Storing JSON blob in an Avro Field

I have inherited project where an avro file that is being consumed by Snowflake. The schema of the avro is as follows:
{
"name": "TableName",
"namespace": "sqlserver",
"type": "record",
"fields": [
{
"name": "hAccount",
"type": "string"
},
{
"name": "hTableName",
"type": "string"
},
{
"name": "hRawJSON",
"type": "string"
}
]
}
The hRawJSON is a blob of JSON itself. The previous dev put this as a type of string, and this is where I believe the problem lies.
The application takes a JSON object (the JSON is varible so I never know the contents or what it contains) and populates the hRawJSON field in the Avro record. But it contains the escape characters for the double quotes in the string:
hAccount:"H11122"
hTableName:"Departments"
hRawJSON:"{\"DepartmentID\":1,\"ModelID\":0,\"Description\":\"P Medicines\",\"Margin\":\"3.300000000000000e+001\",\"UCSVATRateID\":0,\"References\":719,\"HeadOfficeID\":1,\"DividendID\":0}"
As a result the JSON blob is staged into Snowflake as a VARIANT field but still retains the escape characters:
Snowflake image
This means when querying the data in the JSON I constantly have to use this:
PARSE_JSON(RAW_FILE:hRawJSON):DepartmentID
I can't help feeling that the field type of string in the Avro file is causing the issue and that a different type should be used. I've tried Record, but without fields it's unuable. Doc also not working.
The other alternative is that this behavior is correct and when moving the hRawJSON from staging into "proper" tables I should use something like:
INSERT INTO DATA.PUBLIC.DEPARTMENTS
SELECT
RAW_FILE:hAccount::VARCHAR(4) as Account,
PARSE_JSON(RAW_FILE:hRawJSON) as JsonRaw
FROM DATA.STAGING.AVRO_RAW WHERE RAW_FILE:hTableName::STRING = 'Department';
So if this should be the correct approach and I'm over thinking this I'd appreciate guidance.

MalformedURLException when using "$ref" in json schema

I have a json schema which refers to another json schema present in another folder using "$ref" (relative path) and i get a "MalformedURLException".
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$ref": "#/definitions/Base",
"definitions": {
"Base": {
"type": "object",
"additionalProperties": false,
"properties": {
"event": {
"$ref": "com/artifacts/click/ClickSchema.json"
},
"arrival_timestamp": {
"type": "integer",
"minimum": 0.0
}
},
"title": "Base"
}
}
}
And the click schema in another folder is as follows:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "com/artifacts/click/ClickSchema.json",
"Event": {
"type": "object",
"additionalProperties": false,
"properties": {
"sourceName": {
"type": "string"
}
}
}
}
Can someone please help. I am using this schema validator.
A JSON Schema knows nothing about where it sits in a file, and nothing about the other fils in a folder, by default.
It looks like the library you're using recognises this, and suggests you use a special reference protocol (classpath) to target other files in a folder with ease:
https://github.com/everit-org/json-schema#loading-from-the-classpath
As your schemas grow you will want to split that up into multiple
source files and wire them with "$ref" references. If you want to
store the schemas on the classpath (instead of eg. serving them
through HTTP) then the recommended way is to use the classpath:
protocol to make the schemas reference each other.
This isn't something defined by JSON Schema.
The more common approach is to load in all the schemas you intend to use, and allow for local resolution where you have the files already. The library you're using also supports this: https://github.com/everit-org/json-schema#registering-schemas-by-uri
Sometimes it is useful to work with preloaded schemas, to which we
assign an arbitary URI (maybe an uuid) instead of loading the schema
through a URL. This can be done by assigning the schemas to a URI with
the #registerSchemaByURI() method of the schema loader. Example:
SchemaLoader schemaLoader = SchemaLoader.builder()
.registerSchemaByURI(new URI("urn:uuid:a773c7a2-1a13-4f6a-a70d-694befe0ce63"), aJSONObject)
.registerSchemaByURI(new URI("http://example.org"), otherJSONObject)
.schemaJson(jsonSchema)
.resolutionScope("classpath://my/schemas/directory/")
.build();
There are additional considerations if you intend for your schemas to be used by others. If that's the case, do comment, and I'll expand.

How to validate number of properties in JSON schema

I am trying to create a schema for a piece of JSON and have slimmed down an example of what I am trying to achieve.
I have the following JSON schema:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "Set name",
"description": "The exmaple schema",
"type": "object",
"properties": {
"name": {
"type": "string"
}
},
"additionalProperties": false
}
The following JSON is classed as valid when compared to the schema:
{
"name": "W",
"name": "W"
}
I know that there should be a warning about the two fields having the same name, but is there a way to force the validation to fail if the above is submitted? I want it to only validate when there is only one occurrence of the field 'name'
This is outside of the responsibility of JSON Schema. JSON Schema is built on top of JSON. In JSON, the behavior of duplicate properties in an object is undefined. If you want to get warning about this you should run it through a separate validation step to ensure valid JSON before passing it to a JSON Schema validator.
There is a maxProperties constraint that can limit total number of properties in an object.
Though having data with duplicated properties is a tricky case as many json decoding implementions would ignore duplicate.
So your JSON schema validation lib would not even know duplicate existed.

Restrict JSON values to the names of other JSON objects

I'd like to use JSON schema to validate some values. I two objects, call them trackedItems and trackedItemGroups. The trackedItemGroups are a group name and a list of trackedItems names. For example, the schema is similar to:
"TrackedItems": {
"type": "array",
"items": {
"type": "object",
"properties": {
"TrackedItemName": { "type": "string" },
"Properties": { ---- }
}
}
},
"TrackedItemGroups": {
"type": "array",
"items": {
"type": "object",
"properties": {
"GroupName": {
"type": "string"
},
"TrackedItems": {
"type": "array",
"items": {"type": "string"}
}
}
}
}
I'd like to validate that every string in a TrackedItemGroups's TrackedItems array is a name that's been defined in TrackedItems.TrackedItemName.
This would be something like using the enum property to restrict the values, but the enum list is generated based on the values in TrackedITems.TrackedItemName.
How can I write the schema to use the JSON's own data for validation?
I'm aware I could move things around, i.e. the TrackedItems define the group they're in, but there are hundreds of tracked items and this organization works much better for my use case.
I've tried this:
"TrackedItems": {
"type": "array",
"items": {
"oneOf": [
{"$ref":"#/properties/TrackedItems/items/properties/TrackedItemName"}
]
}
}
But this results in an error:
Newtonsoft.Json.Schema.JSchemaReaderException: Could not resolve
schema reference
'#/properties/TrackedItems/items/properties/TrackedItemName'.
For a data example, if I had the TrackedItems:
Item1, Item2, ItemA, ItemB, ItemC
And groups:
Group1:
Item1, ItemB, ItemC
Group2:
Item1, Item2, ItemZ
Group2 would throw a violation because it contains an item not defined in TrackedItems.
Being a vocabulary for validation (and certain other things described by trivial assertions), JSON Schema does not provide a way to verify the consistency of data.
Validation means assertions like "Verify that X is a string."
Consistency means things like "Verify that X is the ID of an existing, active user."
Since data being compared might be in another database altogether, and since these sorts of assertions are non-trivial, JSON Schema leaves verifying the consistency of data up to the application and/or other technologies. Some implementations have vendor-specific extensions for intra-document comparisons, however these are not standardized, and I'm not aware of any that would work here.
A $ref reference doesn't work here, as it's just a way to substitute in another schema by reference. If you can manage to get the reference to work (and I'm not sure why you got an error, this is implementation-specific detail), this schema:
{ "oneOf": [
{"$ref":"#/properties/TrackedItems/items/properties/TrackedItemName"}
] }
Is the exact same thing as saying:
{ "oneOf": [
{"type": "string"}
] }
Since you're asking "verify that one of the following one statements is true", this is also the same as simply:
{"type": "string"}
This is not to say you can't declare relationships between data in JSON using JSON Schema, but JSON Schema is somewhat opinionated about using URIs and hyperlinks to do so.