JSON Schema validation of arrays with mandatory and optional elements - json

I am developing a JSON Schema for validating documents like this one:
{
"map": [
{
"key": "mandatoryKey1",
"value": "value1"
},
{
"key": "mandatoryKey2",
"value": "value2"
},
{
"key": "otherStuff",
"value": "value3"
},
{
"key": "someMoreStuff",
"value": "value4"
}
]
}
The document needs to have a "map" array with elements containing keys and values. There MUST be two elements with mandatoryKey1 and mandatoryKey2. Any other key-value pairs are allowed. Order of the elements should not matter. I found this difficult to express in JSON Schema. I can force the schema to check for the mandatory keys like this (left out the definitions part as it is trivial) :
"map": {
"type": "array",
"minItems": 2,
"items": {
"oneOf": [
{
"$ref": "#/definitions/mandatoryElement1"
},
{
"$ref": "#/definitions/mandatoryElement2"
}
]
}
}
The problems are:
It validates that a document includes the mandatory data, but does not permit any other key/value pairs.
It does not check for duplicates, so it can cheated by including mandatoryElement1 twice. Uniqueness of items can only be checked by tuple validation, which I cannot apply here cause the item order should not matter.
The basic problem I see here is that the array elements somehow need to know about each other, i.e. arbitrary key/value pairs are allowed ONLY IF the mandatory keys are present. This "conditional validation" does not seem to be possible with JSON Schema. Any ideas for a better approach?

Related

Can a JSON schema specify required properties in object OR array?

I am struggling to understand if it's possible to write a json schema that requires certain properties, but also allows those properties to be in different areas of the json file (e.g. a property value can be in the main top-level object OR it can be in an array - it just needs to be somewhere).
For example, I have some devices that collect multiple temperature records over the course of a few hours and send the records in batches. However, some of the devices send the software version once in the main object, while others send the software version along with each hourly temperature record (inside a "records" array).
Example 1 (swversion sent once in main object):
{
"name": "device1",
"swversion": "1.3.abc2",
"records": [
{
"time": "10am",
"temp": 2
},
{
"time": "11am",
"temp": 4
}
]
}
Example 2 (swversion sent inside "records" array):
{
"name": "device1",
"records": [
{
"time": "10am",
"temp": 2,
"swversion": "1.3.abc2"
},
{
"time": "11am",
"temp": 4,
"swversion": "1.3.abc2"
}
]
}
Using these examples, I would like to write my schema definition as follows (the first two bullets are easy, the last one is where I'm struggling):
Main object requires name property and records array
records array can contain objects where time and temp would be required
swversion is required somewhere (could be in the main object or inside records array)
Is there a feature I'm missing in json-schema that enforces required properties, yet allows the flexibility for said properties to be anywhere (e.g. within an object OR an array), as long as they are present somewhere?
The anyOf keyword is a boolean OR operation. At least one of the schemas must pass for the keyword to pass. The first schema requires that the "swversion" property is present at the top level. The second schema requires that the "swversion" property is required in each of the items in the "record" array.
{
... define the easy stuff here, then ...,
"anyOf": [
{ "required": ["swversion"] },
{
"properties": {
"records": {
{ "items": { "required": ["swversion"] } }
}
}
}
]
}
In this example, "swversion" could appear in both places. If you only want to ensure that it only appear in one place (top level or items), you can use oneOf instead of anyOf.
Is there a feature .. that allows the flexibility for said properties to be anywhere
Not directly, but it's not difficult to express this. You can define the structure of "swversion" itself in a definition that is re-used via a reference.
In pseudocode, that would be:
any of:
the main object contains a "swversion" property,
all the items under "records" contain a "swversion" property
In code:
{
"$defs": {
"swversion": {
"type": "string",
.. other constraints?
}
},
"type": "object",
"properties": {
... other property definitions ...,
"records": {
"items": {
"type": "object",
... other definitions for the mandatory portion of records ...
}
}
},
"anyOf": [
{
"$comment": "swversion is a member of the main object",
"required": [ "swversion" ],
"properties": {
"swversion": {
"$ref": "#/$defs/swversion"
}
},
{
"$comment": "swversion is a member of all the items under the records property",
"properties": {
"records": {
"items": {
"type": "object",
"required": [ "swversion" ],
"properties": {
"swversion": {
"$ref": "#/$defs/swversion"
}
}
}
}
}
}
],
}
Note that if you are using JSON Schema version draft7 or earlier, change $defs to definitions.

Validate each JSON node with different JSON schema

Im trying to make a system monitor, which is highly customisable by user. This customization is achieved by using JSON file for modeling look of system monitor. The JSON could look like this.
{
"_": "WINDOW",
"name": "myWindow",
"children": [
{
"_": "CPU",
"name": "cpuMonitor",
"freq_Unit": "MHZ"
},
{
"_": "NETWORK",
"name": "network",
"unit": "Kb/s"
},
{
"_": "DISK",
"name": "disk"
}
],
"background": "red"
}
As you can see, each object coresponds to this schema.
{
"$schema": "http://json-schema.org/draft-07/schema#",
"name":"Component",
"type": "object",
"properties":{
"_": {
"type": "string"
},
"name":{
"type":"string"
},
"childern":{
"type":"array"
}
},
"required": ["_","name"]
}
But each component has also its own schema definition. I'd like to parse whole JSON and validate each node for different schema (first if its component and then to its corresponding schema).
I had look at rapidJson and other libraries, but I didnt find solution for validating nodes for different schema. Do you know any library which could do that? Or is it even possible to validate JSON in this way?
All feedback on how to solve this will be appreciated.
Edit: Corrected schema :(
There's a simple approach involved with that, use the oneOf pattern declaration to specify the layout of the array elements. Inside these nested declarations, you specify the fixed identifier (probably the content of your _ field) as a constant, so that there is only one nested schema matching each of your panel types.
Notes:
I had to specify the constant type identifier using the enum specifier because the regular constant specifier didn't work with the library I was using. This may also have been an oversight in the revision of the specification that it was based on.
A different approach is to split the the validation steps. You simply verify that the elements of the array are objects and that they have a string field _ containing one of the supported types. When iterating over the array, you then validate each field individually according to its _ field.
In addition to Ulrich's answer, here's an example of what I'd do:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Component",
"type": "object",
"definitions": {
"base": {
"properties": {
"name": { "type": "string" },
"children": {
"type": "array",
"items": { "$ref": "#" }
}
},
"required": [ "_", "name" ]
},
"cpu": {
"properties": {
"_": { "const": "CPU" },
"freq_Unit": "MHZ"
}
},
"network": {
"properties": {
"_": { "const": "NETWORK" },
"unit": "Kb/s"
}
},
"disk": {
"properties": {
"_": { "const": "DISK" }
}
},
"window": {
"properties": {
"_": { "const": "WINDOW" },
"background": { "enum": [ "red", "orange", "yellow", ... ] }
}
}
},
"allOf": [
{ "$ref": "#/definitions/base" },
{
"oneOf": [
{ "$ref": "#/definitions/cpu" },
{ "$ref": "#/definitions/network" },
{ "$ref": "#/definitions/disk" },
{ "$ref": "#/definitions/window" }
]
}
]
}
First, we require that any instance MUST adhere to base which declares _ and name as required properties. Additionally, we declare a children array property that requires all items also match this schema (giving us a recursive behavior). This doesn't really do much except that it allows us to declare these things in one place instead of having to declare them in the other three definitions.
(Note that we don't declare _ in the properties list. This means that any value will pass for this portion of the schema. We clean it up in the next part. If you want to ensure that future components are declared with strings, then you can add a "type": "string" requirement to that property, but I don't feel it's necessary unless others are authoring those components.)
Second, we declare each of our specific types as separate definitions, using the const keyword to isolate the one we want. This construct is analogous to a switch (or case) statement. If the instance doesn't match one of these explicit options, it fails. If it's missing one of the required base properties, it fails.
This will get you where you want to be.
To take it further, there are two more things you can do:
Add required to the other definitions to say that the specific properties are also required (e.g. freq_Unit for the cpu definition).
Declare each of the definitions in separate files. This would allow you to add a new definition by simply adding a new file and referencing it in the main schema. In my opinion, it's a bit cleaner. Some people prefer to have it all in one file, though.

Restrict JSON values to the names of other JSON objects

I'd like to use JSON schema to validate some values. I two objects, call them trackedItems and trackedItemGroups. The trackedItemGroups are a group name and a list of trackedItems names. For example, the schema is similar to:
"TrackedItems": {
"type": "array",
"items": {
"type": "object",
"properties": {
"TrackedItemName": { "type": "string" },
"Properties": { ---- }
}
}
},
"TrackedItemGroups": {
"type": "array",
"items": {
"type": "object",
"properties": {
"GroupName": {
"type": "string"
},
"TrackedItems": {
"type": "array",
"items": {"type": "string"}
}
}
}
}
I'd like to validate that every string in a TrackedItemGroups's TrackedItems array is a name that's been defined in TrackedItems.TrackedItemName.
This would be something like using the enum property to restrict the values, but the enum list is generated based on the values in TrackedITems.TrackedItemName.
How can I write the schema to use the JSON's own data for validation?
I'm aware I could move things around, i.e. the TrackedItems define the group they're in, but there are hundreds of tracked items and this organization works much better for my use case.
I've tried this:
"TrackedItems": {
"type": "array",
"items": {
"oneOf": [
{"$ref":"#/properties/TrackedItems/items/properties/TrackedItemName"}
]
}
}
But this results in an error:
Newtonsoft.Json.Schema.JSchemaReaderException: Could not resolve
schema reference
'#/properties/TrackedItems/items/properties/TrackedItemName'.
For a data example, if I had the TrackedItems:
Item1, Item2, ItemA, ItemB, ItemC
And groups:
Group1:
Item1, ItemB, ItemC
Group2:
Item1, Item2, ItemZ
Group2 would throw a violation because it contains an item not defined in TrackedItems.
Being a vocabulary for validation (and certain other things described by trivial assertions), JSON Schema does not provide a way to verify the consistency of data.
Validation means assertions like "Verify that X is a string."
Consistency means things like "Verify that X is the ID of an existing, active user."
Since data being compared might be in another database altogether, and since these sorts of assertions are non-trivial, JSON Schema leaves verifying the consistency of data up to the application and/or other technologies. Some implementations have vendor-specific extensions for intra-document comparisons, however these are not standardized, and I'm not aware of any that would work here.
A $ref reference doesn't work here, as it's just a way to substitute in another schema by reference. If you can manage to get the reference to work (and I'm not sure why you got an error, this is implementation-specific detail), this schema:
{ "oneOf": [
{"$ref":"#/properties/TrackedItems/items/properties/TrackedItemName"}
] }
Is the exact same thing as saying:
{ "oneOf": [
{"type": "string"}
] }
Since you're asking "verify that one of the following one statements is true", this is also the same as simply:
{"type": "string"}
This is not to say you can't declare relationships between data in JSON using JSON Schema, but JSON Schema is somewhat opinionated about using URIs and hyperlinks to do so.

Creating a type definition for a property named "type" using JSON schema

I'm trying to create a JSON schema for an existing JSON file that looks something like this:
{
"variable": {
"name": "age",
"type": "integer"
}
}
In the schema, I want to ensure the type property has the value string or integer:
{
"variable": {
"name": "string",
"type": {
"type": "string",
"enum": ["string", "integer"]
}
}
}
Unfortunately it blows up with message: ValidationError {is not any of [subschema 0]....
I've read that there are "no reserved words" in JSON schema, so I assume a type of type is valid, assuming I declare it correctly?
The accepted answer from jruizaranguren doesn't actually answer the question.
The problem is that given JSON (not JSON schema, JSON data) that has a field named "type", it's hard to write a JSON schema that doesn't choke.
Imagine that you have an existing JSON data feed (data, not schema) that contains:
"ids": [ { "type": "SSN", "value": "123-45-6789" },
{ "type": "pay", "value": "8675309" } ]
What I've found in trying to work through the same problem is that instead of putting
"properties": {
"type": { <======= validation chokes on this
"type": "string"
}
you can put
"patternProperties": {
"^type$": {
"type": "string"
}
but I'm still working through how to mark it as a required field. It may not be possible.
I think, based on looking at the "schema" in the original question, that JSON schemas have evolved quite a lot since then - but this is still a problem. There may be a better solution.
According to the specification, in the Valid typessection for type:
The value of this keyword MUST be either a string or an array. If it is an array, elements of the array MUST be strings and MUST be unique.
String values MUST be one of the seven primitive types defined by the core specification.
Later, in Conditions for successful validation:
An instance matches successfully if its primitive type is one of the types defined by keyword. Recall: "number" includes "integer".
In your case:
{
"variable": {
"name": "string",
"type": ["string", "integer"]
}
}

Enforcing "style" rules in JSON Schema files?

I am looking at using JSON Schemas for an upcoming project, and looking for a way to validate our naming conventions/style and consistency rules in the JSON Schema file. Somewhat similar to StyleCop or Checkstyle.
Using this samples from JSON Schema Lint to illustrate:
{
"description": "Any validation failures are shown in the right-hand Messages pane.",
"type": "object",
"properties": {
"foo": {
"type": "number"
},
"bar": {
"type": "string",
"enum": [
"a",
"b",
"c"
]
}
}
}
Imagine another developer wants to add a new property, but I want to prevent property names from being upper-case (baz instead of Baz) or maybe boolean properties should start with "is" (isBaz). Is there a way to "unit test" the JSON Schema file and check for that?
"Baz": {
"type": "boolean"
},
It feels like a custom validator for the JSON Schema file (vs. using the JSON Schema to validate the JSON output). Does something like that already exist, or do I just parse the JSON schema file myself and write the rules?
It's completely possible to write a meta-schema that enforces this constraint on your schemas. Let's construct it step-by-step:
1. Constraining property names
The key part is to use patternProperties to specify which property names are allowed, and additionalProperties to disallow anything else:
{
"patternProperties": {
"^[a-z]+([A-Z][a-z]*)*$": {}
},
"additionalProperties": false
}
(For this example, I've used the regex ^[a-z]+([A-Z][a-z]*)*$ to detect alphabetic-only lowerCamelCase)
Note that it doesn't matter whether provide any constraints for suitably-named properties (here it's just the empty schema {}). However, the presence of this definition means that any matching property is allowed, while anything else is banned by additionalProperties.
Fancier constraints
For other constraints (such as your "boolean properties must start with is" one), you just add more complex entries here.
This answer focuses more on how to make a generic recursive naming-style schema. It's already pretty long, so if you're looking for guidance on how to express a specific constraint, then it might be neater to ask as a separate question.
2. Applying to the properties property
This bit's pretty simple - make these constraints apply to the appropriate part of the schema:
{
"properties": {
"properties": {"$ref": "#/definitions/propertyStyleRule"}
},
"definitions": {
"propertyStyleRule": {
"patternProperties": {
"^[a-z]+([A-Z][a-z]*)*$": {}
},
"additionalProperties": false
}
}
}
3. Make it recursive
In fact, you don't just want to cover sub-schemas inside "properties", but also "items", "anyOf", etc.
Here it gets quite long, so I'll omit most of it, but basically you go through every keyword that might contain a schema, and make sure they are subject to the same naming-scheme by referencing the root schema:
{
"properties": {
"properties": {"$ref": "#/definitions/propertyStyleRule"},
"additionalProperties": {"$ref": "#"},
"items": {"$ref": "#"},
"not": {"$ref": "#"},
"allOf": {"$ref": "#"},
...
},
"definitions": {
"propertyStyleRule": {
"patternProperties": {
"^[a-z]+([A-Z][a-z]*)*$": {"$ref": "#"}
},
"additionalProperties": false
}
}
}
Note: we've also now replaced the empty schema ({}) in our "propertyStyleRule" definition with a reference back to the root ({"$ref": "#"}), so the sub-schemas inside properties also recurse properly.
4. Hang on, some of those keywords can be arrays, or booleans, or...
OK, so there's an obvious problem here: "not" holds a schema, so that's fine, but "allOf" holds an array of schemas, "items" can hold either, and "additionalProperties" can be a boolean.
We could do some fancy switching with different types, or we could simply add an items entry to our root schema:
{
"items": {"$ref": "#"},
"properties": {
...
},
"definitions": {
"propertyStyleRule": {...}
}
}
Because we haven't specified a type, our root schema actually allows instances to be objects/arrays/boolean/string/whatever - and if the instance isn't an object, then properties is just ignored.
Similarly, items is ignored unless the instance is an array - but if it is an array, then the entries must also follow the root schema. So it doesn't matter whether the value of "items" is a schema or an array of schemas, it recurses properly either way.
5. Schema maps
For a few keywords (like "patternProperties" or "definitions") the value is not a schema, it's a map of strings to schemas, so you can't just reference the root schema. For these, we'll make a definition "schemaMap", and reference that instead:
{
"items": {"$ref": "#"},
"properties": {
"properties": {"$ref": "#/definitions/propertyStyleRule"},
"additionalProperties": {"$ref": "#"},
"items": {"$ref": "#"},
"not": {"$ref": "#"},
"allOf": {"$ref": "#"},
...
"patternProperties": {"$ref": "#/definitions/schemaMap"},
...
},
"definitions": {
"schemaMap": {
"type": "object",
"additionalProperties": {"$ref": "#"}
},
"propertyStyleRule": {...}
}
}
... and you're done!
I've left out details, but hopefully it's clear enough how to write the full version.
Also, once you've written this once, it should be pretty easy to adapt it for different style rules, or even applying similar constraints to the names in "definitions", etc. If you do write a schema like this, please consider posting it somewhere so that other people can adapt it! :)