How to validate structure of json using phpleague/json-guard - json

I'm using Laravel 5.4 for an api and have an endpoint that accepts JSON along the lines of:
{
"input": {
"owner": "name of owner",
"content": [
]
}
}
I want to get only the JSON inside input and ensure that it is valid, both structurally and based on the content.
Using http://json-guard.thephpleague.com and their basic example from the overview page, everything comes back as valid no matter what I change as the input so I assume I am using it wrong.
From their example I have constructed the following. It passes validation. The issue is that I cannot get it to fail.
routes file
Route::post('validate', 'TestController#validateJson');
TestController#validateJson
public function validateJson()
{
$dereferencer = \League\JsonReference\Dereferencer::draft4();
$schema = json_decode('{ "properties": { "id": { "type": "string", "format": "uri" } } }');
$data = json_decode('{ "id": "https://json-guard.dev/schema#" }');
$validator = new \League\JsonGuard\Validator($data, $schema);
if ($validator->fails()) {
return response($validator->errors());
}
return response('all ok');
}
I believe I might need to use the JSON Reference and define a custom schema, but until I can fully understand the example and get it to fail, I don't want to do anything more complicated.

everything comes back as valid no matter what I change as the input so I assume I am using it wrong.
It is hard to say exactly what the issue is without seeing an example of what input you tried. Since the example schema validates that id is a string with the uri format, it should fail when you provide a string that is not a valid URI. The following example will return a format error:
<?php
require __DIR__ . '/vendor/autoload.php';
$schema = json_decode('{ "properties": { "id": { "type": "string", "format": "uri" } } }');
$data = json_decode('{ "id": "hello world" }');
$validator = new \League\JsonGuard\Validator($data, $schema);
echo json_encode($validator->errors(), JSON_PRETTY_PRINT);
There are some parts of JSON Schema that are unintuitive and you might not expect to pass.
A constraint only validates if it applies to that type. If you don't pass an object validation will pass because the properties constraint only applies to objects:
$data = null;
var_dump((new \League\JsonGuard\Validator($data, $schema))->passes()); // true
To make non object inputs fail you need to specify "type": "object". Note that this will also happen with invalid JSON, since json_decode will return null when it fails.
Validation will also pass with an empty object:
$data = json_decode('{}');
var_dump((new \League\JsonGuard\Validator($data, $schema))->passes()); // true
To make the properties required you have to use the required keyword.
I want to get only the JSON inside input and ensure that it is valid, both structurally and based on the content.
This schema should work as a starting point:
{
"type": "object",
"properties": {
"input": {
"type": "object",
"properties": {
"owner": {
"type": "string",
"minLength": 1"
},
"content": {
"type": "array"
}
},
"required": ["owner", "content"],
"additionalProperties": false
}
},
"required": ["input"],
"additionalProperties": false
}
We are requiring that the input is an object, that it has an input property and no other properties, and that input is an object with the properties owner and content. The minLength rule on owner is to prevent an empty string from passing.
I believe I might need to use the JSON Reference and define a custom schema
You can write pretty complex schemas without using JSON Reference at all. It's just a way to reuse a schema without copy and pasting. For example, you might define a 'money' schema and use that for any dollar amounts in your API.
You definitely will need to write a custom schema. The schema replaces the Laravel validation rules you would write for each endpoint otherwise.

Related

Elasticsearch dynamic mapping for object within attribute

Wondering if I can create a "dynamic mapping" within an elasticsearch index. The problem I am trying to solve is the following: I have a schema that has an attribute that contains an object that can differ greatly between records. I would like to mirror this data within elasticsearch if possible but believe that automatic mapping may get in the way.
Imagine a scenario where I have a schema like the following:
{
name: string
origin: string
payload: object // can be of any type / schema
}
Is it possible to create a mapping that supports this? I do not need to query the records by this payload attribute, but it would be great if I can.
Note that I have checked the documentation but am confused on if what elastic calls dynamic mapping is what I am looking for.
It's certainly possible to specify which queryable fields you expect the payload to contain and what those fields' mappings should be.
Let's say each doc will include the fields payload.livemode and payload.created_at. If these are the only two fields you'll want to perform queries on, and you'd like to disable dynamic, index-time mappings autogenerated by Elasticsearch for the rest of the fields, you can use dynamic templates like so:
PUT my-payload-index
{
"mappings": {
"dynamic_templates": [
{
"variable_payload": {
"path_match": "payload",
"mapping": {
"type": "object",
"dynamic": false,
"properties": {
"created_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"livemode": {
"type": "boolean"
}
}
}
}
}
],
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"origin": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Then, as you ingest your docs:
POST my-payload-index/_doc
{
"name": "abc",
"origin": "web.dev",
"payload": {
"created_at": "2021-04-05 08:00:00",
"livemode": false,
"abc":"def"
}
}
POST my-payload-index/_doc
{
"name": "abc",
"origin": "web.dev",
"payload": {
"created_at": "2021-04-05 08:00:00",
"livemode": true,
"modified_at": "2021-04-05 09:00:00"
}
}
and verify with
GET my-payload-index/_mapping
no new mappings will be generated for the fields payload.abc nor payload.modified_at.
Not only that — the new fields will also be ignored, as per the documentation:
These fields will not be indexed or searchable, but will still appear in the _source field of returned hits.
Side note: if fields are neither stored nor searchable, they're effectively the opposite of enabled.
The Big Picture
Working with variable contents of a single, top-level object is quite standard. Take for instance the stripe event object — each event has an id, an api_version and a few other shared params. Then there's the data object that's analogous to your payload field.
Now, all is fine, until you need to aggregate on the contents of your payload. See, since the content is variable, so are the data paths / accessors. But wildcards in aggregation paths don't work in Elasticsearch. Scripts do but are onerous to maintain.
Back to stripe. They partially solved it through what they call polymorphic, typed hashes — as discussed in their blog on API design:
A pretty neat approach that's worth emulating.
P.S. I discuss dynamic templates in more detail in the chapter "Mapping Automation" of my ES Handbook.

How should a json-schema (draft7) implementation resolve `$ref`s defined in unknown keywords?

JSON-Schema-Test-Suite defines schemas such as this, and I assume they are valid:
{
"tilda~field": {"type": "integer"},
"slash/field": {"type": "integer"},
"percent%field": {"type": "integer"},
"properties": {
"tilda": {"$ref": "#/tilda~0field"},
"slash": {"$ref": "#/slash~1field"},
"percent": {"$ref": "#/percent%25field"}
}
}
Take this example below:
{
"$id": "http://example.com/root.json",
"definitions": {
"A": { "type": "integer" }
},
"properties": {
"$id": {
"type": "string"
},
"attributes": {
"$ref": "#/tilda~0field/slash~1field/$id"
}
},
"tilda~field": {
"$id": "t/inner.json",
"slash/field": {
"$id": {
"$id": "test/b",
"$ref": "document.json"
}
}
}
}
Which of the following is the $ref at #/tilda~0field/slash~1field/$id/$ref resolved to?
http://example.com/root.json/document.json
http://example.com/t/document.json
http://example.com/t/test/document.json
Which of the $ids in #/tilda~0field must be considered as baseURI for the $ref in question and why.
$refs are not resolved within unknown keywords because $ref and $id only apply inside schemas. Let's look at an example to see what I mean by this.
{
"$id": "http://example.com/foo",
"type": "object",
"properties": {
"aaa": {
"const": { "$ref": "#/definitions/bbb" }
},
"bbb": { "$ref": "#/definitions/bbb" }
},
"ccc": { "$ref": "#/definitions/bbb" },
"definitions": {
"bbb": { "type": "string" }
}
}
The document as a whole is a schema, so /$id is understood by JSON Schema.
The properties keyword is defined as an object whose values are schemas. Therefore, the value at /properties/bbb is a schema and /properties/bbb/$ref is understood by JSON Schema.
The value of the const keyword is unconstrained. The value at /properties/aaa/const may look like a schema, but it's just a plain JSON object. Therefore /properties/aaa/const/$ref is not understood by JSON Schema.
The value at /ccc is not a JSON Schema keyword, so it's value is not constrained and not a schema. Therefore, the $id and $ref keywords are not understood by JSON Schema.
That's how it works now. When you go back to older drafts (draft-05 iirc), it's a little different. Before then, $ref was defined in a separate specification called JSON Reference. JSON Schema extended JSON Reference. Therefore, the semantics of $ref applied everywhere it appeared in a JSON Schema.
EDIT:
What happens when a $ref inside a known keyword references a schema deep inside an unknown keyword. for example, what if #/properties/bbb referenced #/ccc?
That's a really good question. Referencing #/ccc should be an error because #/ccc is not a schema and $ref only allows you to reference a schema.
I just saw your question on the JSON Schema site in an issue. I'll post here what I posted there as well. However, also look at the edit.
Which of the following is the $ref at #/tilda~0field/slash~1field/$id/$ref resolved to?
I think Section 8.2 gives the answer: "A subschema's "$id" is resolved against the base URI of its parent schema." This means that it will be http://example.com/t/test/document.json.
Which of the $ids in #/tilda~0field must be considered as baseURI for the $ref in question and why.
The base URI for this is the $id at the root, re-routed by the $id in tilda~field.
Start with http://example.com/root.json.
Change folders to t and use file inner.json.
Change folders to test (inside t) and use file document.json.
EDIT
Having a look at #customcommander's reply and realizing that the value in tilde~field isn't processed as a schema, I'd like to say that the #ref wouldn't be processed at all. It's just a plain JSON string with no inherent meaning.
Are you sure your second schema is valid though?
For example according to the JSON Schema Core specification, the $id property should be a string
If present, the value for this keyword MUST be a string
Therefore the following looks wrong to me:
"slash/field": {
"$id": {
"$id": "test/b",
"$ref": "document.json"
}
}
Then I think that your first $ref isn't correct either:
"attributes": {
"$ref": "#/tilda~0field/slash~1field/$id"
}
It should probably read:
"attributes": {
"$ref": "#/tilda~0field/slash~1field"
}
The same specification document also says this about $id and $ref:
Cf this example
The "$id" keyword defines a URI for the schema, and the base URI that other URI references within the schema are resolved against.
Or as Ajv simply puts it:
$ref is resolved as the uri-reference using schema $id as the base URI […].
Given the current state of your schema, I can only approximate the answer to your question but the ref would probably resolve to something like t/inner.json#document.json

How to validate number of properties in JSON schema

I am trying to create a schema for a piece of JSON and have slimmed down an example of what I am trying to achieve.
I have the following JSON schema:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "Set name",
"description": "The exmaple schema",
"type": "object",
"properties": {
"name": {
"type": "string"
}
},
"additionalProperties": false
}
The following JSON is classed as valid when compared to the schema:
{
"name": "W",
"name": "W"
}
I know that there should be a warning about the two fields having the same name, but is there a way to force the validation to fail if the above is submitted? I want it to only validate when there is only one occurrence of the field 'name'
This is outside of the responsibility of JSON Schema. JSON Schema is built on top of JSON. In JSON, the behavior of duplicate properties in an object is undefined. If you want to get warning about this you should run it through a separate validation step to ensure valid JSON before passing it to a JSON Schema validator.
There is a maxProperties constraint that can limit total number of properties in an object.
Though having data with duplicated properties is a tricky case as many json decoding implementions would ignore duplicate.
So your JSON schema validation lib would not even know duplicate existed.

Creating a type definition for a property named "type" using JSON schema

I'm trying to create a JSON schema for an existing JSON file that looks something like this:
{
"variable": {
"name": "age",
"type": "integer"
}
}
In the schema, I want to ensure the type property has the value string or integer:
{
"variable": {
"name": "string",
"type": {
"type": "string",
"enum": ["string", "integer"]
}
}
}
Unfortunately it blows up with message: ValidationError {is not any of [subschema 0]....
I've read that there are "no reserved words" in JSON schema, so I assume a type of type is valid, assuming I declare it correctly?
The accepted answer from jruizaranguren doesn't actually answer the question.
The problem is that given JSON (not JSON schema, JSON data) that has a field named "type", it's hard to write a JSON schema that doesn't choke.
Imagine that you have an existing JSON data feed (data, not schema) that contains:
"ids": [ { "type": "SSN", "value": "123-45-6789" },
{ "type": "pay", "value": "8675309" } ]
What I've found in trying to work through the same problem is that instead of putting
"properties": {
"type": { <======= validation chokes on this
"type": "string"
}
you can put
"patternProperties": {
"^type$": {
"type": "string"
}
but I'm still working through how to mark it as a required field. It may not be possible.
I think, based on looking at the "schema" in the original question, that JSON schemas have evolved quite a lot since then - but this is still a problem. There may be a better solution.
According to the specification, in the Valid typessection for type:
The value of this keyword MUST be either a string or an array. If it is an array, elements of the array MUST be strings and MUST be unique.
String values MUST be one of the seven primitive types defined by the core specification.
Later, in Conditions for successful validation:
An instance matches successfully if its primitive type is one of the types defined by keyword. Recall: "number" includes "integer".
In your case:
{
"variable": {
"name": "string",
"type": ["string", "integer"]
}
}

how to give empty for email in json

I need to construct json schema for email which will accept either empty or email format.
I used as follows
"emailID": {
"type": "string",
"required": false,
"format": "(^$)|email"
}
But it is not validating email id is in right format or not.
even if email is simply A or * it is accepting.
But if I remove (^$) in format it is validating perfectly
how to put both conditions
First of all: this is a draft v3 schema; it is invalid against the current draft because of your required.
Then, your requirements are quite strange: either you say that the object member is not required, or if it is present it can either be an empty string or a valid email? Something is wrong somewhere.
Not that this is not doable; it is:
"emailID": {
"type": "string",
"oneOf": [
{ "enum": [ "" ] },
{ "format": "email" }
]
}
But given your "sloppy" set of constraints I am of the opinion that something is wrong somewhere.