I want to do three things
Validate JSON against a JSON-Schema
Create JSON-Schema to AVRO Schema converter
Create JSON-Schema to Hive Table converter
The problem I'm facing is the Schema has a referencing chain.
I'm trying to use this JSON Schema Validator which resolves reference and validates but getting some errors at the moment.
But I haven't been able to find any library for the 2nd and the 3rd task.
And I have to create Nifi processors for these. I have done it for the first one.
One idea I have is to use an Inline Parser to deference the Schemas and create one big schema and use that for the tasks and hopefully, everything will work smoothly afterward.
Any suggestions on what is a good approach to tackle these issues.
One of the schemas is attached. Any help would be appreciated.
{
"id": "/schemas/bi/events/identification/carrier",
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "Users Carrier Identified",
"description": "A successfully identified carrier of a user",
"type": "object",
"definitions": {
"carrier_identification_result": {
"type": "object",
"properties": {
"mno": {
"type": "string",
"title": "Mobile network operator",
"description": "The Mobile network operator",
"example": "Telekom"
},
"mvno": {
"type": "string",
"title": " Mobile virtual network operator",
"description": "The Mobile virtual network operator.",
"example": "Mobilcom-Debitel"
},
"mcc": {
"type": "string",
"title": "Mobile Country Code",
"description": "The Mobile Country Code as defined in the ITU-T Recommendation E.212",
"example": "262"
},
"mnc": {
"type": "string",
"title": "Mobile Network Code",
"description": "The Mobile Network Code as defined in the ITU-T Recommendation E.212",
"example": "01"
},
"country": {
"type": "string",
"title": "The code ISO 3166-1 alpha 2 for the country",
"example": "DE"
}
},
"required": [
"mno",
"country"
]
}
},
"allOf": [
{
"$ref": "../identification_service.json"
},
{
"properties": {
"type": {
"constant": "identification.carrier",
"example": "identification.carrier"
},
"event_data": {
"allOf": [
{
"$ref": "../identification_service.json#/definitions/event_data"
},
{
"type": "object",
"properties": {
"result": {
"$ref": "#/definitions/carrier_identification_result"
},
"required": [
"result"
]
}
}
]
}
}
}
]
}
Related
Lets say I have two schemas defined as follows -
ADDRESS_CLASS_SCHEMA_DEFINITION = {
"title": "Address",
"type": "object",
"properties": {
"country_code": {
"$ref": "#/definitions/CountryCode"
},
"city_code": {
"title": "City Code",
"type": "string"
},
"zipcode": {
"title": "Zipcode",
"type": "string"
},
"address_str": {
"title": "Address Str",
"type": "string"
}
},
"required": [
"country_code",
"city_code",
"zipcode"
],
"definitions": {
"CountryCode": {
"title": "CountryCode",
"description": "An enumeration.",
"enum": [
"CA",
"USA",
"UK"
],
"type": "string"
}
}
}
EMPLOYEE_CLASS_SCHEMA_DEFINITION = {
"title": "Employee",
"type": "object",
"properties": {
"id": {
"title": "Id",
"type": "integer"
},
"name": {
"title": "Name",
"type": "string"
},
"email": {
"title": "Email",
"type": "string"
},
"telephone": {
"title": "Telephone",
"type": "string"
},
"address": {
"$ref": "#/definitions/Address"
}
},
"required": [
"id",
"name",
"email"
],
"definitions": {
"Address": ADDRESS_CLASS_SCHEMA_DEFINITION
}
}
I'm trying to re-use sub-schema definitions by defining a constant and referencing them individually in definitions (for example address-schema is referenced through constant in employee-schema definition). This approach works for individual schemas, however there seems to be a json-pointer path issue for Employee schema - #/definitions/CountryCode wouldn't resolve in Employee schema. I was assuming that #/definitions/CountryCode would be a relative path on Address schema as its scope is defined on a sub-schema, but my understanding seems wrong. I can make it work by flattening out like below, however I donot want to take this route -
{
"title": "Employee",
"type": "object",
"properties": {
"id": {
"title": "Id",
"type": "integer"
},
"name": {
"title": "Name",
"type": "string"
},
"email": {
"title": "Email",
"type": "string"
},
"telephone": {
"title": "Telephone",
"type": "string"
},
"address": {
"$ref": "#/definitions/Address"
}
},
"required": [
"id",
"name",
"email"
],
"definitions": {
"CountryCode": {
"title": "CountryCode",
"description": "An enumeration.",
"enum": [
"CA",
"USA",
"UK"
],
"type": "string"
},
"Address": {
"title": "Address",
"type": "object",
"properties": {
"country_code": {
"$ref": "#/definitions/CountryCode"
},
"city_code": {
"title": "City Code",
"type": "string"
},
"zipcode": {
"title": "Zipcode",
"type": "string"
},
"address_str": {
"title": "Address Str",
"type": "string"
}
},
"required": [
"country_code",
"city_code",
"zipcode"
]
}
}
}
I'm wondering how to fix this, I've briefly looked into jsonschema-bundling and using $id but from best practices it seems like the general recommendation is to use $id when dealing with URI's alone. Would like to know about best practices and how to fix this problem, would also appreciate if someone can point me on how to use $id correctly (for example, constant based approach seems to work when I provide identifiers like $id: Address, $id: Employee). Thanks in advance.
JSON Schema implementations work in JSON land. When you combine your schemas in your example above, presumably in javascript/node.js, by the time it gets to the JSON Schema implementation for validation execution, any knowledge that there were separate schemas is lost. (It's generally not considered that this approach is the best approach.)
The EASY fix here SHOULD be just to define $id in each of the roots of your schemas. These should be a fully qualfied URI. It doesn't really matter what they are at this point. They could be https://example.com/a and https://example.com/b. Then, in the primary schema, you can do $ref: https://example.com/b.
Implementations should provide you with a way to load in your other/non-primary schemas so the $id values can be stored in an index. Using $id in your other schema with a fully qualified URI will signify a "resource boundary".
https://json-schema.hyperjump.io is the only web playground to support multiple files/schemas/"Schema Resources", so you can test this out there to confirm your expectations.
Not all implementations make it easy or even provide a means to import your other schemas, but they should.
If you have follow up questions, feel free to leave a comment, or join the JSON Schema slack server if it would be off-topic for StackOverflow.
I have a JSON object like:
{
"result": [
{
"name" : "abc",
"email": "abc.test#mail.com"
},
{
"name": "def",
"email": "def.test#mail.com"
},
{
"name": "xyz",
"email": "abc.test#mail.com"
}
]
}
and schema for this:
{
"definitions": {},
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://example.com/object1607582431.json",
"title": "Root",
"type": "object",
"required": [
"result"
],
"properties": {
"result": {
"$id": "#root/result",
"title": "Result",
"type": "array",
"default": [],
"uniqueItems": true,
"items": {
"$id": "#root/result/items",
"title": "Items",
"type": "object",
"required": [
"name",
"email"
],
"properties": {
"name": {
"$id": "#root/result/items/name",
"title": "Name",
"type": "string"
},
"email": {
"$id": "#root/result/items/email",
"title": "Email",
"type": "string"
}
}
}
}
}
}
I am looking for an option to check uniqueness for email irrespective of name. How I can validate that every email should be unique?
You can't. There are no keywords that let you compare one particular data value against another, other than uniqueItems, which compares an array element in toto against another.
The JsonSchema specification does not currently support this.
You can see the active GitHub issue here: https://github.com/json-schema-org/json-schema-vocabularies/issues/22
However, there are various extensions of JsonSchema that do validate unique fields within lists of objects.
If you happen to be using Python you can use the package (I created) JsonVL. It can be installed with pip install jsonvl and then run with jsonvl data.json schema.json.
Code examples in the GitHub repo: https://github.com/gregorybchris/jsonvl
I have following schema:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"description": "Bulk Create Entity",
"title": "Bulk Create Entity",
"type": "object",
"properties": {
"idempotence_key": {
"description": "Idempotence Key",
"type": "string"
},
"requested_by": {
"description": "Requested By",
"type": "string"
},
"updated_by_process_id": {
"description": "Process id which is creating this entity.",
"type": "string"
},
"entity_creation_requests": {
"description": "Entity creation requests.",
"type": "array",
"minItems": 1,
"items": {
"title": "Entity Creation Request",
"type": "object",
"properties": {
"type": {
"description": "Entity Type",
"type": "string"
},
"taxonomies": {
"description": "Entity Taxonomies",
"type": "array",
"items": {
"title": "Taxonomy",
"type": "object",
"properties": {
"name": {
"description": "Taxonomy Name",
"type": "string"
},
"value": {
"description": "Taxonomy Value",
"type": "string"
}
}
}
}
}
}
}
},
"required": [
"idempotence_key",
"requested_by",
"updated_by_process_id",
"entity_creation_requests"
]
}
Here, the root level payload is an object which has a key "entity_creation_requests" which is an array of objects which in turn have an array property "taxonomies" which contains a list of key-value pairs.
Now depending on the "type" of the request under "entity_creation_requests", I want to validate the presence of certain keys in the taxonomies list.
For example, for the creation request of type "product", I want keys "MRP", "seller" etc. to be present in the taxonomies list.
Can we achieve this using JSON schema validator?
{Here is something I have created: https://codebeautify.org/jsonviewer/cb80c728}
Are there any other alternatives?
I am using this in an spring boot application (Java).
I have the following schema:
{
"$schema": "http://json-schema.org/schema#",
"$id": "http://api.hobnob.social/schemas/users/create.json",
"title": "Create User Schema",
"description": "For validating client-provided create user object",
"type": "object",
"properties": {
"email": {
"type": "string",
"format": "email"
},
"password": { "type": "string" },
"profile": { "$ref": "profile.json#" }
},
"required": ["email", "password"],
"additionalProperties": false
}
{
"$schema": "http://json-schema.org/schema#",
"$id": "http://api.hobnob.social/schemas/users/profile.json",
"title": "User Profile Schema",
"description": "For validating client-provided user profile object when creating and/or updating an user",
"type": "object",
"properties": {
"bio": { "type": "string" },
"summary": { "type": "string" },
"name": {
"type": "object",
"properties": {
"first": { "type": "string" },
"last": { "type": "string" },
"middle": { "type": "string" }
},
"additionalProperties": false
}
},
"additionalProperties": false
}
I am using ajv to validate against it. I am getting the expected results in almost all cases. But when validating a json with either the bio or summary fields included (with type of string), no response comes from ajv at all.
E.g. I attempt to validate
{
"email": "e#ma.il",
"password": "password",
"profile": {
"name": {
"first": "firstname"
},
"bio":"this is a bio"
}
}
and no response at all comes back.
I tried consolidating the schema but that made no difference. I'm hoping I have made some simple beginner mistake that someone may spot! I have spent many hours trying to work out what is going wrong, but after all my debugging I am no further forward.
I got this working somehow, but not sure why it started working.
In my test script I added a line to delete the test index from elasticsearch. After that, all tests passed. I then removed the new line from my test script to see if it would stop working again, but it didn't.
I'm guessing the problem was somehow related to elasticsearch...
I'm not sure what the purpose of a JSON Schema "description" field is. Does the field serve as a space to comment? Does the field serve as an ID?
{
"id": "http://www.noodle.org/entry-schema#",
"schema": "http://json-schema.org/draft-04/schema#",
"description": "schema for online courses",
"type": "object",
"properties": {
"institution": {
"type": "object",
"$ref" : "#/definitions/institution"
},
"person": {
"type": "object",
"items": {
"type": "object",
"$ref": "#/definitions/person"
}
"definitions": {
"institution": {
"description": "University",
"type": "object",
"properties": {
"name":{"type":"string"},
"url":{
"format": "uri",
"type": "string"
},
"descriptionofinstitution":{"type":"string"},
"location": {
"description": "location",
"type": "string",
"required": true
}
}
}
According to the JSON-Schema specification (http://json-schema.org/latest/json-schema-validation.html#anchor98), the purpose of the "description" (and "title") fields is to decorate a user interface with information about the data produced by this user interface. A title will preferrably be short, whereas a description will provide explanation about the purpose of the instance described by this schema.
It is probably some additional explanation, in order to enhance the knowledge concerning the specific entry, if the id is not enough. Of course it doesn't affect the behavior of the code as code itself