Is it a good idea to include schema name in JSON document - json

We are developing a public service that ingests JSON messages to be stored in the database.
Now this service is probably the first of many, and I'm working on a way to structure the JSON Schemas. We have a lot of experience with XML Schema, but JSON Schema is a bit new to us.
One the the ideas is to include a Header section into each JSON Schema that contains the schema name, the major version and a unique message ID
Here's a simplified version of such a schema
{
"$schema": "http://json-schema.org/draft-04/schema#",
"id": "http://www.example.com/json/01/sampleMessage",
"type": "object",
"title": "Sample Message for stackoverflow",
"description": "version 01.01",
"properties": {
"Header": {
"$ref": "#/definitions/Header"
},
"EanGsrn": {
"description": "Unique identifier of the Headpoint.",
"type": "string",
"pattern": "^[0-9]{18}$"
},
"Sector": {
"description": "Sector for which the Headpoint is valid.",
"type": "string",
"enum": [
"Electricity", "Gas"
]
}
},
"additionalProperties": false,
"required": [
"Header", "EanGsrn", "Sector"
],
"definitions": {
"Header": {
"id": "#Header",
"type": "object",
"description": "Standard header for all messages",
"properties": {
"Schema": {
"description": "$id of the schema of this message",
"type": "string",
"enum": ["http://www.example.com/json/01/sampleMessage"]
},
"Version": {
"description": "The specific version of the shema of this message",
"type": "string",
"pattern": "^01\\.[0-9]{2,3}$"
},
"MessageId": {
"description": "Unique identifier for the Message",
"type": "string",
"pattern": "^[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}$"
}
},
"required": [
"Schema", "Version", "MessageId"
],
"additionalProperties": false
}
}
}
The advantages of this should be:
A message for the wrong schema or major version would be rejected immediately at the schema validation step.
The JSON itself contains information about it's schema and version, making life easier for the people needing to investigate issues etc... later on.
Questions
Is this usual, or are there other best practices in the JSON world to handle things like this?
Is this a good idea, of am I missing something obvious here?

There is no "best practice" for defining how a JSON instance identifies the schema it should conform to outside of an HTTP request.
The spec provides a header name to define the schema, but this only works if your JSON data is always served over HTTP.
Other systems similar to yours have included this information in the JSON data as a header like section, but there's no defined "best practice" or approach which the specification itself details.

Related

Schema validation of multi reference chainied schema

I want to do three things
Validate JSON against a JSON-Schema
Create JSON-Schema to AVRO Schema converter
Create JSON-Schema to Hive Table converter
The problem I'm facing is the Schema has a referencing chain.
I'm trying to use this JSON Schema Validator which resolves reference and validates but getting some errors at the moment.
But I haven't been able to find any library for the 2nd and the 3rd task.
And I have to create Nifi processors for these. I have done it for the first one.
One idea I have is to use an Inline Parser to deference the Schemas and create one big schema and use that for the tasks and hopefully, everything will work smoothly afterward.
Any suggestions on what is a good approach to tackle these issues.
One of the schemas is attached. Any help would be appreciated.
{
"id": "/schemas/bi/events/identification/carrier",
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "Users Carrier Identified",
"description": "A successfully identified carrier of a user",
"type": "object",
"definitions": {
"carrier_identification_result": {
"type": "object",
"properties": {
"mno": {
"type": "string",
"title": "Mobile network operator",
"description": "The Mobile network operator",
"example": "Telekom"
},
"mvno": {
"type": "string",
"title": " Mobile virtual network operator",
"description": "The Mobile virtual network operator.",
"example": "Mobilcom-Debitel"
},
"mcc": {
"type": "string",
"title": "Mobile Country Code",
"description": "The Mobile Country Code as defined in the ITU-T Recommendation E.212",
"example": "262"
},
"mnc": {
"type": "string",
"title": "Mobile Network Code",
"description": "The Mobile Network Code as defined in the ITU-T Recommendation E.212",
"example": "01"
},
"country": {
"type": "string",
"title": "The code ISO 3166-1 alpha 2 for the country",
"example": "DE"
}
},
"required": [
"mno",
"country"
]
}
},
"allOf": [
{
"$ref": "../identification_service.json"
},
{
"properties": {
"type": {
"constant": "identification.carrier",
"example": "identification.carrier"
},
"event_data": {
"allOf": [
{
"$ref": "../identification_service.json#/definitions/event_data"
},
{
"type": "object",
"properties": {
"result": {
"$ref": "#/definitions/carrier_identification_result"
},
"required": [
"result"
]
}
}
]
}
}
}
]
}

Should JSON-schema require documents to declare `$schema` (be self-referential)?

JSON-schema says that a JSON document can declare the schema to which the document conforms using the $schema property. Example:
{
"$schema": "http://example.com/example_fancy_schema#",
"example_fancy_property": "cute fluffy kittens"
}
where the schema looks like:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "Example Fancy Schema",
"description": "The schema that describes the example format.",
"type": "object",
"properties": {
"example_fancy_property": {
"type": "string",
"enum": ["cute fluffy kittens"]
}
},
"additionalProperties": false,
"required": [ "example_fancy_property" ]
}
Does this mean that one should add a property for this in the actual schema, e.g.:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "Example Fancy Schema",
"description": "The schema that describes the example format.",
"type": "object",
"properties": {
"$schema": {
"type": "string",
"enum": ["http://example.com/example_fancy_schema#"]
},
"example_fancy_property": {
"type": "string",
"enum": ["cute fluffy kittens"]
}
},
"additionalProperties": false,
"required": [ "$schema", "example_fancy_property" ]
}
None of the examples on the JSON-schema website appear to declare this, so I suspect one isn't supposed to declare it. But I'm curious anyway :)
The $schema keyword is recommended for use in JSON Schemas, to denote the version of the schema standard being used.
However, it has no special meaning in data. Over HTTP, there are recommended ways to associate data with a schema, but the $schema property is not one of them.

JSON Hyper-Schema: different schemas for GET and POST

I want to describe an API that has fields which allows for different ways to define values when POSTing an item, but only ever output in the field in one specific way.
For example, I might want to describe an API where an item can be created or updated like this: {"name": "Task", "due": "2014-12-31"} or like this: {"name": "Task", "due": {"$date": 1419984000000}}, but it is only ever returned from the API in the first way.
The schema for POST/PUT could therefore be:
{
"type": "object"
"properties": {
"name": {
"type": "string"
},
"due": {
"oneOf": [
{
"type": "string",
"format": "date"
},
{
"type": "object",
"properties": {
"$date": {
"type": "number"
}
},
"required": ["$date"],
"additionalProperties": false
}
]
}
}
}
Whereas the schema for access via GET would be much simpler:
{
"type": "object"
"properties": {
"name": {
"type": "string"
},
"due": {
"type": "string",
"format": "date"
}
}
}
It would be good for consumers of the API to know that they only have to account for one possible output method rather then all of them.
Is there any accepted standard approach to specify the different schemas within the context of JSON Hyper-Schema? I've thought about specifying these differences via the "links" property, but I do not know what "rel" I would define these schemas under and it seems very-non-standard.
If I understood correctly, and you want to specify one schema per operation you can do it with standard hyper-schema. Let's see and example for a post operation:
{
"description": "create an item.",
"href": "/items",
"method": "POST",
"rel": "create",
"schema": {
"$ref": "#/api/createitem"
},
"title": "Create an item"
}
The actual schema that is required is referenced in "schema" property through "$ref".
If you also wanted to describe the response types, then you could use "targetSchema" property. Be aware that this is advisory only (as it is explained in the docs)

Schema definition generated online Schema-Generator is not accepted by BigQuery while using Load Table API

Any relevant help will be appreciated.
I have several different JSON docs whic need to be inserted into BigQuery. Now to avoid generating schema manually, I am using the help of online Json Schema Generation tools available. But the schema generated by them are not being accepted by BigQuery Load Data wizard.
For eaxmple: for a Json data like this:
{"_id":100,"actor":"KK","message":"CCD is good in Pune",
"comment":[{"actor":"Subho","message":"CCD is not as good in Kolkata."},
{"actor":"bisu","message":"CCD is costly too in Kolkata"}]
}
the generated schema by online tool is:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"description": "Generated from c:jsonccd.json with shasum a003286a350a6889b152
b3e33afc5458f3771e9c",
"type": "object",
"required": [
"_id",
"actor",
"message",
"comment"
],
"properties": {
"_id": {
"type": "integer"
},
"actor": {
"type": "string"
},
"message": {
"type": "string"
},
"comment": {
"type": "array",
"minItems": 1,
"uniqueItems": true,
"items": {
"type": "object",
"required": [
"actor",
"message"
],
"properties": {
"actor": {
"type": "string"
},
"message": {
"type": "string"
}
}
}
}
}
}
But when I put it into BigQuery in the Load Data wizard, it fails with errors.
How can this be mitigated?
Thanks.
The schema generated by that tool is way more complex than what BigQuery requires.
Look at the sample in the docs:
"schema": {
"fields": [
{"name":"f1", "type":"STRING"},
{"name":"f2", "type":"INTEGER"}
]
},
https://developers.google.com/bigquery/loading-data-into-bigquery?hl=en#loaddatapostrequest
Meanwhile the tool mentioned in the question adds fields like $schema, description, type, required, properties that are not necessary and confusing to the BigQuery schema parser.

What is the purpose of the "description" field in JSON Schemas?

I'm not sure what the purpose of a JSON Schema "description" field is. Does the field serve as a space to comment? Does the field serve as an ID?
{
"id": "http://www.noodle.org/entry-schema#",
"schema": "http://json-schema.org/draft-04/schema#",
"description": "schema for online courses",
"type": "object",
"properties": {
"institution": {
"type": "object",
"$ref" : "#/definitions/institution"
},
"person": {
"type": "object",
"items": {
"type": "object",
"$ref": "#/definitions/person"
}
"definitions": {
"institution": {
"description": "University",
"type": "object",
"properties": {
"name":{"type":"string"},
"url":{
"format": "uri",
"type": "string"
},
"descriptionofinstitution":{"type":"string"},
"location": {
"description": "location",
"type": "string",
"required": true
}
}
}
According to the JSON-Schema specification (http://json-schema.org/latest/json-schema-validation.html#anchor98), the purpose of the "description" (and "title") fields is to decorate a user interface with information about the data produced by this user interface. A title will preferrably be short, whereas a description will provide explanation about the purpose of the instance described by this schema.
It is probably some additional explanation, in order to enhance the knowledge concerning the specific entry, if the id is not enough. Of course it doesn't affect the behavior of the code as code itself