Enforcing "style" rules in JSON Schema files? - json

I am looking at using JSON Schemas for an upcoming project, and looking for a way to validate our naming conventions/style and consistency rules in the JSON Schema file. Somewhat similar to StyleCop or Checkstyle.
Using this samples from JSON Schema Lint to illustrate:
{
"description": "Any validation failures are shown in the right-hand Messages pane.",
"type": "object",
"properties": {
"foo": {
"type": "number"
},
"bar": {
"type": "string",
"enum": [
"a",
"b",
"c"
]
}
}
}
Imagine another developer wants to add a new property, but I want to prevent property names from being upper-case (baz instead of Baz) or maybe boolean properties should start with "is" (isBaz). Is there a way to "unit test" the JSON Schema file and check for that?
"Baz": {
"type": "boolean"
},
It feels like a custom validator for the JSON Schema file (vs. using the JSON Schema to validate the JSON output). Does something like that already exist, or do I just parse the JSON schema file myself and write the rules?

It's completely possible to write a meta-schema that enforces this constraint on your schemas. Let's construct it step-by-step:
1. Constraining property names
The key part is to use patternProperties to specify which property names are allowed, and additionalProperties to disallow anything else:
{
"patternProperties": {
"^[a-z]+([A-Z][a-z]*)*$": {}
},
"additionalProperties": false
}
(For this example, I've used the regex ^[a-z]+([A-Z][a-z]*)*$ to detect alphabetic-only lowerCamelCase)
Note that it doesn't matter whether provide any constraints for suitably-named properties (here it's just the empty schema {}). However, the presence of this definition means that any matching property is allowed, while anything else is banned by additionalProperties.
Fancier constraints
For other constraints (such as your "boolean properties must start with is" one), you just add more complex entries here.
This answer focuses more on how to make a generic recursive naming-style schema. It's already pretty long, so if you're looking for guidance on how to express a specific constraint, then it might be neater to ask as a separate question.
2. Applying to the properties property
This bit's pretty simple - make these constraints apply to the appropriate part of the schema:
{
"properties": {
"properties": {"$ref": "#/definitions/propertyStyleRule"}
},
"definitions": {
"propertyStyleRule": {
"patternProperties": {
"^[a-z]+([A-Z][a-z]*)*$": {}
},
"additionalProperties": false
}
}
}
3. Make it recursive
In fact, you don't just want to cover sub-schemas inside "properties", but also "items", "anyOf", etc.
Here it gets quite long, so I'll omit most of it, but basically you go through every keyword that might contain a schema, and make sure they are subject to the same naming-scheme by referencing the root schema:
{
"properties": {
"properties": {"$ref": "#/definitions/propertyStyleRule"},
"additionalProperties": {"$ref": "#"},
"items": {"$ref": "#"},
"not": {"$ref": "#"},
"allOf": {"$ref": "#"},
...
},
"definitions": {
"propertyStyleRule": {
"patternProperties": {
"^[a-z]+([A-Z][a-z]*)*$": {"$ref": "#"}
},
"additionalProperties": false
}
}
}
Note: we've also now replaced the empty schema ({}) in our "propertyStyleRule" definition with a reference back to the root ({"$ref": "#"}), so the sub-schemas inside properties also recurse properly.
4. Hang on, some of those keywords can be arrays, or booleans, or...
OK, so there's an obvious problem here: "not" holds a schema, so that's fine, but "allOf" holds an array of schemas, "items" can hold either, and "additionalProperties" can be a boolean.
We could do some fancy switching with different types, or we could simply add an items entry to our root schema:
{
"items": {"$ref": "#"},
"properties": {
...
},
"definitions": {
"propertyStyleRule": {...}
}
}
Because we haven't specified a type, our root schema actually allows instances to be objects/arrays/boolean/string/whatever - and if the instance isn't an object, then properties is just ignored.
Similarly, items is ignored unless the instance is an array - but if it is an array, then the entries must also follow the root schema. So it doesn't matter whether the value of "items" is a schema or an array of schemas, it recurses properly either way.
5. Schema maps
For a few keywords (like "patternProperties" or "definitions") the value is not a schema, it's a map of strings to schemas, so you can't just reference the root schema. For these, we'll make a definition "schemaMap", and reference that instead:
{
"items": {"$ref": "#"},
"properties": {
"properties": {"$ref": "#/definitions/propertyStyleRule"},
"additionalProperties": {"$ref": "#"},
"items": {"$ref": "#"},
"not": {"$ref": "#"},
"allOf": {"$ref": "#"},
...
"patternProperties": {"$ref": "#/definitions/schemaMap"},
...
},
"definitions": {
"schemaMap": {
"type": "object",
"additionalProperties": {"$ref": "#"}
},
"propertyStyleRule": {...}
}
}
... and you're done!
I've left out details, but hopefully it's clear enough how to write the full version.
Also, once you've written this once, it should be pretty easy to adapt it for different style rules, or even applying similar constraints to the names in "definitions", etc. If you do write a schema like this, please consider posting it somewhere so that other people can adapt it! :)

Related

JSON Schema construction for a objects with. common properties but differing in some properties

I have a a number of objects which share a common set of features but differ in one or more properties. The common content is specified as media content in the definitions. I have provided one such object with a 'format' property, but there are other objects, omitted to keep it short, that also have additional properties. Here is a snippet of my attempt at constructing the schema. Is this the correct way to accomplish, this? Many thanks
"definitions": {
"media-content":{
"type": "object",
"title": {
"type": "string"
},
"related-media": {
"type": "object",
"additionalProperties": {
"type": "string"
}
}
},
"type": "object",
"properties": {
"type": {
"format": "string",
"enum":["audio", "video"]
},
"category": {
"$ref": "#/definitions/media-content"
}
}
Is this the way to do it?
The first thing that stands out to me is that this isn't valid JSON Schema.
The title keyword provides a title for the schema so it expects a string, but because you've provided a schema, it looks like you're wanting it to be a property. Similarly related-media looks like you expect this to be a property. Are you missing wrapping these in a properties keyword, like you have later for type and category?
These changes would make media-content look like this:
"media-content":{
"type": "object",
"properties": {
"title": {
"type": "string"
},
"related-media": {
"type": "object",
"additionalProperties": {
"type": "string"
}
}
}
}
I have provided one such object with a 'format' property
Again, here, I'm not sure what you're getting at.
"properties": {
"type": {
"format": "string",
"enum":["audio", "video"]
},
"category": {
"$ref": "#/definitions/media-content"
}
}
This says you're expecting type to be a property in your object, but format isn't used right. Although the format keyword has some predefined types and does accept custom values, the way you're using it look like you really want to be using the type keyword. In the end, it doesn't matter because enum will restrict the value to the items you declare in the array ("audio" or "video").
It might be easier to see what you're trying to do if you posted a full minimum exaple.
That said, I'll try to build one to answer the question I think you're asking.
It sounds like you're wanting to build a polymorphic relationship: an inheritance hierarchy where a base type defines some properties and a number of derived types define additional properties.
There's not really a good way to do that with JSON Schema because JSON Schema is a constraints system. That is, you start with {} where anything is valid, and by adding keywords, you're reducing the number of values that are valid.
Still, you might be able to achieve something close by using allOf and $ref.
First, declare the base property set in its own schema. I'd separate them into independent schemas for easier handling. You also need to give it an $id.
{
"$id": "/base-type"
"type": "object",
"properties": {
"base-prop-1": { "type": "string" },
"base-prop-2": { "type": "number" }
}
}
Next, for each of your "derived" schemas, you'll want to create a new schema, each with their own $id value, that references the base schema and declares its own additional requirements.
{
"$id": "/derived-type-1",
"allOf": [
{ "$ref": "/base-type" },
{
"properties": {
"derived-prop": { "type": "boolean" }
}
}
]
}
This second schema requires everything from the /base-type and also requires a derived-prop property that holds a boolean value.

Json Schema Properties with $ref

Trying to use $ref for the entirety of properties. I can't tell what this is syntax valid but doesn't validate the payload. This should fail but doesn't.
I've also tried "$ref": "file:./ref.json".
schema:
{
"animal": {
"properties":{
"allOf": {"$ref": "file:./ref.json"}
}
},
"required": ["animal"]
}
ref.json:
{
"action":{
"type": "string"
},
"required": ["action"]
}
payload
{
"animal": {
"action": 2
}
}
"allOf": {"$ref": "file:./ref.json"} is not syntactically valid -- the value of an allOf must be an array. (your evaluator should be giving you a warning about this.)
JSON Schema evaluators are not required to support loading external files from disk or the network. Check your documentation for how to add documents to the evaluator so they can be used by $ref. (your evaluator should be giving you a warning when you reference an unknown resource.)
The reason why you are not seeing the above errors is because your overall schema has no recognized keywords in it -- you are missing a "properties": { ... } wrapped around the entire schema. The top level "keyword" is "animal", which is not recognized, therefore there are no recognized keywords anywhere in the schema, therefore there is nothing to make it return an invalid result.

Restrict JSON values to the names of other JSON objects

I'd like to use JSON schema to validate some values. I two objects, call them trackedItems and trackedItemGroups. The trackedItemGroups are a group name and a list of trackedItems names. For example, the schema is similar to:
"TrackedItems": {
"type": "array",
"items": {
"type": "object",
"properties": {
"TrackedItemName": { "type": "string" },
"Properties": { ---- }
}
}
},
"TrackedItemGroups": {
"type": "array",
"items": {
"type": "object",
"properties": {
"GroupName": {
"type": "string"
},
"TrackedItems": {
"type": "array",
"items": {"type": "string"}
}
}
}
}
I'd like to validate that every string in a TrackedItemGroups's TrackedItems array is a name that's been defined in TrackedItems.TrackedItemName.
This would be something like using the enum property to restrict the values, but the enum list is generated based on the values in TrackedITems.TrackedItemName.
How can I write the schema to use the JSON's own data for validation?
I'm aware I could move things around, i.e. the TrackedItems define the group they're in, but there are hundreds of tracked items and this organization works much better for my use case.
I've tried this:
"TrackedItems": {
"type": "array",
"items": {
"oneOf": [
{"$ref":"#/properties/TrackedItems/items/properties/TrackedItemName"}
]
}
}
But this results in an error:
Newtonsoft.Json.Schema.JSchemaReaderException: Could not resolve
schema reference
'#/properties/TrackedItems/items/properties/TrackedItemName'.
For a data example, if I had the TrackedItems:
Item1, Item2, ItemA, ItemB, ItemC
And groups:
Group1:
Item1, ItemB, ItemC
Group2:
Item1, Item2, ItemZ
Group2 would throw a violation because it contains an item not defined in TrackedItems.
Being a vocabulary for validation (and certain other things described by trivial assertions), JSON Schema does not provide a way to verify the consistency of data.
Validation means assertions like "Verify that X is a string."
Consistency means things like "Verify that X is the ID of an existing, active user."
Since data being compared might be in another database altogether, and since these sorts of assertions are non-trivial, JSON Schema leaves verifying the consistency of data up to the application and/or other technologies. Some implementations have vendor-specific extensions for intra-document comparisons, however these are not standardized, and I'm not aware of any that would work here.
A $ref reference doesn't work here, as it's just a way to substitute in another schema by reference. If you can manage to get the reference to work (and I'm not sure why you got an error, this is implementation-specific detail), this schema:
{ "oneOf": [
{"$ref":"#/properties/TrackedItems/items/properties/TrackedItemName"}
] }
Is the exact same thing as saying:
{ "oneOf": [
{"type": "string"}
] }
Since you're asking "verify that one of the following one statements is true", this is also the same as simply:
{"type": "string"}
This is not to say you can't declare relationships between data in JSON using JSON Schema, but JSON Schema is somewhat opinionated about using URIs and hyperlinks to do so.

JSON Schema regarding use of $ref

I understand that $ref takes a URI to a json schema to use but where does $ref : "#" point to?
Does it just mean use the current schema for this block level? Or does it mean to use the root level schema defined in the root level id?
Thanks
EDIT:
So if I have:
"items": {
"anyOf": [
{ "$ref": "#" },
{ "$ref": "#/definitions/schemaArray" }
],
"default": {}
}
Because it lacks an id field it will attempt to validate the instance items with the root schema first and then if that fails try to validate it with the schemaArray schema defined in the definitions schema, right?
So if I change it to:
"items": {
"id" : "#/items",
"anyOf": [
{ "$ref": "#" },
{ "$ref": "#/definitions/schemaArray" }
],
"default": {}
}
Then the first subschema in anyOf array will point to the items schema itself?
EDIT #2: Okay so if I had:
"items": {
"id" : "itemSchema",
"anyOf": [
{ "$ref": "#" },
{ "$ref": "#/definitions/schemaArray" }
],
"default": {}
}
and
"stringArray": {
"type": "array",
"items": { "$ref" : "itemSchema" },
"minItems": 1,
"uniqueItems": true
}
"stringArray"'s "items" field would be validated against the above "itemsSchema"?
Also does the second $ref in 'anyOf' work by going to the root and then traversing down the path till it hits that schema?
Thanks!
OK: each $ref is resolved into a full URI. Once that is done, all your questions are answered by asking the question: What schema would I end up with, if I simply fetched that URI? Where the $ref is, how it was loaded, all of that is irrelevant - it's entirely dependent on the resolved URI.
The library might take some shortcuts (like caching documents so they are only fetched once, or trusting one schema to "speak for" another), but those are all implementation details.
Response to original question:
# is not special: all values of $ref are resolved as URIs relative to the current document (or the closest value of "id", if there is one).
Therefore, if you haven't used "id", then # will point to the root of the schema document. If you fetched your schema from http://example.com/schema, then a {"$ref": "#"} anywhere inside that will resolve to http://example.com/schema#, which is the document itself.
It is different when you use "id", because it changes the "base" schema against which the $ref is resolved:
{
"type": "array",
"items": {
"id": "http://example.com/item-schema",
"type": "object",
"additionalProperties": {"$ref": "#"}
}
}
In that example, the $ref resolves to http://example.com/item-schema#. Now, if your JSON Schema setup trusts the schema it already has, then it can re-use the value from "items".
However, the point is there is nothing special about # - it just resolves to a URI like any other.
Response to EDIT 1:
Your first example is correct.
However, your second is unfortunately not. This is because of the way that fragments resolution works for URIs: one fragment completely replaces another. When you resolve the # against the "id" value of #/items, you don't end up with #/items again - you end up with #. So in your second example, the first entry in "anyOf" will still resolve to the root of the document, just as in the first example.
Response to EDIT 2:
Assuming the document is loaded from http://example.com/my-schema, the full URIs of your two $refs are:
http://example.com/itemSchema#
http://example.com/itemSchema#/definitions/schemaArray
For the first one, the library may use the schema it already has, but it might not - after all, looking at the URIs, http://example.com/my-schema might not be trusted to accurately represent http://example.com/itemSchema.
For the second one - that's not going to work, because the "itemSchema" doesn't have a "definitions" section, so that $ref won't resolve properly at all.

How would you design JSON Schema for an arbitrary key?

I have the following JSON output data:
{
"label_name_0" : 0,
"label_name_5" : 3,
.
.
.
"label_name_XXX" : 4
}
The output is simple: a key[1] name associated with integer value. If the key name doesn't change, I can easily come up with JSON Schema similar to this:
{
"type": "array"
"title": "Data output",
"items" :{
"properties": {
"label_name": {
"type": "integer",
"default": 0,
"readonly": True,
}
}
},
Since the key name itself is not known and keep changing, I have to design schema for it. The only thing I know is that the key is string and not more than 100 characters. How do I define a JSON Schema for the key lable_name_xxx that keeps changing.
[1] Not sure if I am using the right terminology
On json-schema.org you will find something appropriate in the File System Example section. You can define patternProperties inside an object.
{
"type": "object",
"properties": {
"/": {}
},
"patternProperties": {
"^(label_name_[0-9]+)+$": { "type": "integer" }
},
"additionalProperties": false,
}
The regular expression (label_name_[0-9]+)+ should fit your needs. In JSON Schema regular expressions are explicitly anchored with ^ and $. The regular expressions defines, that there has to be at least one property (+). The property consists of label_name_ and a number between 0 and 9 whereas there has to be at least one number ([0-9]+), but there can also arbitrary many of them.
By setting additionalProperties to false it constrains object properties to match the regular expression.
As Konrad's answer stated, use patternProperties. But use in place of properties, which is not needed, and I think Konrad just pasted from his reference example that was expecting a path starting with /. In the example below, the pattern match regex .* accepts any property name and I am allowing types of string or null only by using "additionalProperties": false.
"patternProperties": {
"^.*$": {
"anyOf": [
{"type": "string"},
{"type": "null"}
]
}
},
"additionalProperties": false
Simpler solution than patternProperties, since OP does not have any requirement on the key names (documentation):
{
"type": "object",
"additionalProperties": {
"type": "integer",
"default": 0,
"readonly": true,
}
}
default and readonly included because they were included in the OP's initial suggestion, but they are not required.