Avro Schema format Exception - “SecurityClassification” is not a defined name - json

I'm trying to use this avro schema
{
"type": "record",
"name": "ComplianceEntity",
"namespace": "com.linkedin.events.metadata",
"fields": [
{
"name": "fieldPath",
"type": "string"
},
{
"name": "complianceDataType",
"type": {
"type": "enum",
"name": "ComplianceDataType",
"symbols": [
"NONE",
"MEMBER_ID"
],
"symbolDocs": {
"NONE": "None of the following types apply",
"MEMBER_ID": "ID for LinkedIn members"
}
}
},
{
"name": "complianceDataTypeUrn",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "fieldFormat",
"type": [
"null",
{
"type": "enum",
"name": "FieldFormat",
"symbols": [
"NUMERIC"
],
"symbolDocs": {
"NUMERIC": "Numerical format, 12345"
},
"doc": "The field format"
}
]
},
{
"name": "securityClassification",
"type": "SecurityClassification"
},
{
"name": "valuePattern",
"default": null,
"type": [
"null",
"string"
]
}
]
}
To generate and avro file using the avro-tools:
java -jar ./avro-tools-1.8.2.jar compile schema ComplianceEntity.avsc .
But I am getting the following error message:
Exception in thread "main" org.apache.avro.SchemaParseException: "SecurityClassification" is not a defined name. The type of the "securityClassification" field must be a defined name or a {"type": ...} expression.
Could anyone tell, why SecurityClassification is not identified as a defined name?

You are using it as type of your field, however you are not defining it properly like for complianceDataType, that's the reason why you are getting the avro exception
{
"name": "securityClassification",
"type": "SecurityClassification"
}
Make sure that if you have more than 1 Schema, you pass all of them, especially dependency schemas. It is supported from AVRO 1.5.3 https://issues.apache.org/jira/browse/AVRO-877.
java -jar ./avro-tools-1.8.2.jar compile schema SecurityClassification.avsc ComplianceEntity.avsc .

Related

Kafka cp-server fails message validation on a broker side

I need to validate messages on a brokers side.
I run cp-server (merely ran cp-all-in-one compose file).
created a topic
set confluent.value.schema.validation to true
registered a schema (JSON)
produced a message
It always fails. Why validation fails? Should I change configuration?
Schema:
{
"$id": "http://example.com/models/data-item-definition.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"description": "test data item 1",
"properties": {
"array_val": {
"items": {
"type": "string"
},
"type": [
"array",
"null"
]
},
"int_val": {
"type": "integer"
},
"string_val": {
"type": "string"
}
},
"required": [
"string_val",
"int_val"
],
"title": "data item",
"type": "object"
}
Message:
{
"string_val": "text",
"int_val": 10,
"array_val": ["one", "two", "three"]
}
An issue is that if confluent.value.schema.validation is true then to produce a message we need to send a magic byte and a schema ID at the beginning.
See Wire Format.

Unique value for a property validation using json schema

I have a JSON object like:
{
"result": [
{
"name" : "abc",
"email": "abc.test#mail.com"
},
{
"name": "def",
"email": "def.test#mail.com"
},
{
"name": "xyz",
"email": "abc.test#mail.com"
}
]
}
and schema for this:
{
"definitions": {},
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://example.com/object1607582431.json",
"title": "Root",
"type": "object",
"required": [
"result"
],
"properties": {
"result": {
"$id": "#root/result",
"title": "Result",
"type": "array",
"default": [],
"uniqueItems": true,
"items": {
"$id": "#root/result/items",
"title": "Items",
"type": "object",
"required": [
"name",
"email"
],
"properties": {
"name": {
"$id": "#root/result/items/name",
"title": "Name",
"type": "string"
},
"email": {
"$id": "#root/result/items/email",
"title": "Email",
"type": "string"
}
}
}
}
}
}
I am looking for an option to check uniqueness for email irrespective of name. How I can validate that every email should be unique?
You can't. There are no keywords that let you compare one particular data value against another, other than uniqueItems, which compares an array element in toto against another.
The JsonSchema specification does not currently support this.
You can see the active GitHub issue here: https://github.com/json-schema-org/json-schema-vocabularies/issues/22
However, there are various extensions of JsonSchema that do validate unique fields within lists of objects.
If you happen to be using Python you can use the package (I created) JsonVL. It can be installed with pip install jsonvl and then run with jsonvl data.json schema.json.
Code examples in the GitHub repo: https://github.com/gregorybchris/jsonvl

Create MongoDB collection with standard JSON schema

I want to create a MongoDB collection using an JSON schema file.
Suppose the JSON file address.schema.json contain address information schema (this file is one of the Json-schema.org's examples):
{
"$id": "https://example.com/address.schema.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"description": "An address similar to http://microformats.org/wiki/h-card",
"type": "object",
"properties": {
"post-office-box": {
"type": "string"
},
"extended-address": {
"type": "string"
},
"street-address": {
"type": "string"
},
"locality": {
"type": "string"
},
"region": {
"type": "string"
},
"postal-code": {
"type": "string"
},
"country-name": {
"type": "string"
}
},
"required": [ "locality", "region", "country-name" ],
"dependencies": {
"post-office-box": [ "street-address" ],
"extended-address": [ "street-address" ]
}
}
What is the MongoDB command, such as mongoimport or db.createCollection to create a MongoDB collection using the above schema?
It can be nice if I can use the file directly in MongoDB with the need of changing the file format manually.
I wonder if the JSON schema format is a standard one, why do I need to change it to adopt it for MongoDB. Why MongoDB does not have this functionality built-in?
You can create it via createCollection command or shell helper:
db.createCollection(
"mycollection",
{validator:{$jsonSchema:{
"description": "An address similar to http://microformats.org/wiki/h-card",
"type": "object",
"properties": {
"post-office-box": { "type": "string" },
"extended-address": { "type": "string" },
"street-address": { "type": "string" },
"locality": { "type": "string" },
"region": { "type": "string" },
"postal-code": { "type": "string" },
"country-name": { "type": "string" }
},
"required": [ "locality", "region", "country-name" ],
"dependencies": {
"post-office-box": [ "street-address" ],
"extended-address": [ "street-address" ]
}
}}})
You only need to specify bsonType instead of type if you want to use a type that exists in bson but not in generic json schema. You do have to remove the lines $id and $schema as those are not supported by MongoDB JSON schema support (documented here)
The only option which you can use is adding jsonSchema validator during collection creating: https://docs.mongodb.com/manual/reference/operator/query/jsonSchema/#document-validator. It will mean that any document which you will insert/update in your collection will have to match with the provided schema

avro runtime exception not a map when return in Json format

i have a avro schema for UKRecord, which contain a list of CMRecord(also avro schemaed):
{
"namespace": "com.uhdyi.hi.avro",
"type": "record",
"name": "UKRecord",
"fields": [
{
"name": "coupon",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "cm",
"type": [
"null",
{
"type": "array",
"items": {
"type": "record",
"name": "CmRecord",
"fields": [
{
"name": "id",
"type": "string",
"default": ""
},
{
"name": "name",
"type": "string",
"default": ""
}
]
}
}
],
"default": null
}
]
}
in my java code, i create a UKRecord which has all fields populated correctly, eventually i need to return this object using a json based api, however it complained:
org.apache.avro.AvroRuntimeException: Not a map: {"type":"record","name":"CmRecord","namespace":"com.uhdyi.hi.avro","fields":[{"name":"id","type":"string","default":""},{"name":"name","type":"string","default":""}]}
the java code that write the object to json is :
ObjectWriter writer = ObjectMapper.writer();
if (obj != null) {
response.setHeader(HttpHeaders.Names.CONTENT_TYPE, "application/json; charset=UTF-8");
byte[] bytes = writer.writeValueAsBytes(obj); <-- failed here
...
}
obj is:
{"coupon": "c12345", "cm": [{"id": "1", "name": "name1"}, {"id": "2", "name": "name2"}]}
why do i get this error? please help!
Because you are using unions, Avro is uncertain how to interpret the JSON you are providing. Here's how you can change the JSON so Avro knows it's not null
{
"coupon": { "string": "c12345" },
"cm": { "array": [
{ "id": "1", "name": "name1" },
{ "id": "2", "name": "name2" }
]
}
}
I know, it's really annoying how Avro chose to handle nulls.

JSON.Net: Schema validates where it shouldn't when using anyOf

I'm trying to detect if the user specified a boolean as a string instead of a real boolean.
I'm testing commentsModule/enabled to see if the value is false, once with quotes and once without.
The online validator: http://json-schema-validator.herokuapp.com/ works correctly, and identifies the failure as "instance value (\"false\") not found in enum (possible values: [false])".
However, NewtonSoft Json (latest version) with exactly the same schema and json defines this as a valid json.
Schema:
{
"$schema":"http://json-schema.org/draft-04/schema#",
"description": "pages json",
"type": "object",
"properties":
{
"name": {"type":"string"},
"description": {"type":"string"},
"channel": {"type":"string"},
"commentsModule":{
"type": "object",
"anyOf":[
{ "$ref": "#/definitions/commentsModuleDisabled" }
]
}
},
"definitions":{
"commentsModuleDisabled":{
"required": [ "enabled" ],
"properties": {
"enabled": { "type": "boolean", "enum": [ false ] }
}
}
}
}
(using oneOf gives the same result)
JSON:
{
"_id": {
"$oid": "530dfec1e4b0ee95f0f3ce11"
},
"pageId": 1234,
"pageType": "Show",
"name": "my name",
"description": "this is decription.” ",
"channel": "tech",
"commentsModule": {
"CaptionFieldDoesntExist": "Comments",
"enabled": "false"
},
"localInstance": "com",
"productionYear": "2014",
"navbarCaptionLink": "",
"logoAd": ""
}
Json.Net code (taken from the official site):
JsonSchema schema = JsonSchema.Parse(schemaJson);
JObject jsonToVerify = JObject.Parse(json);
IList<string> messages;
bool valid = jsonToVerify.IsValid(schema, out messages);
Thank you!
EDIT:
Json.Net doesn't support Json Schema v4, so the "definitions" references are ignored.
For example, in this case the "caption" is checked for minimal length of 1, and has 0, but Json.net passes validation:
JSON
{
"_id": {
"$oid": "530dfec1e4b0ee95f0f3ce11"
},
"pageId": 1234,
"pageType": "Show",
"name": "another name",
"description": "description ",
"channel": "tech",
"commentsModule": {
"caption": "",
"enabled": true
},
"localInstance": "com",
"productionYear": "2014",
"navbarCaptionLink": "",
"logoAd": "" }
Schema:
{
"$schema":"http://json-schema.org/draft-04/schema#",
"description": "pages json",
"type": "object",
"properties":
{
"name": {"type":"string"},
"description": {"type":"string"},
"channel": {"type":"string"},
"commentsModule":{
"type": "object",
"oneOf":[
{ "$ref": "#/definitions/commentsModuleDisabled" },
{ "$ref": "#/definitions/commentsModuleEnabled" }
]
}
},
"definitions":{
"commentsModuleDisabled":{
"required": [ "enabled" ],
"properties": {
"enabled": { "type": "boolean", "enum": [ false ] }
}
},
"commentsModuleEnabled":{
"required": [ "enabled", "caption" ],
"properties": {
"enabled": { "type": "boolean", "enum": [ true ] },
"caption": { "type": "string", "minLength": 1 }
}
}
} }
the error from the online tool in this case talks about mismatches for both schemas and refers to the minimal length requirement:
"message" : "instance failed to match exactly one schema (matched 0 out of 2)"
... "message" : "string \"\" is too short (length: 0, required minimum: 1)",
Json.Net doesn't support Json Schema v4, only v3. That's why "anyOf" and "definitions" are not recognized and the validation passes.
Update:
Json.NET Schema has full support for Draft 4.