How to define avro schema for complex json document? - json

THIS IS AN EXAMPLE OF MY JSON:
{"ID":2,"name":"Donatello","lastname":"Di Niccoló","age":23,"hobbies":["reading","dancing",{"sports":["rafting","baseball"]}],"address":{"street":"Tepito", "number":"77", "districts":"Benito Juárez", "country": "CDMX"}}
THIS IS MY AVRO SCHEMA
{"type":"record","name":"myrecord","fields":[
{"name":"ID","type":"int"},
{"name":"name", "type": "string"},
{"name":"lastname", "type": "string"},
{"name":"age", "type": "int"},
{"name":"hobbies","type": {
"type": "array",
"items": {
"type": "array", "items": "string",
"type":"record","name":"myhobbies",
"fields":[
{"name":"sports","type":{"type": "array", "items": "string"}}
]
} }
},
{"name":"address","type":{"type":"record","name":"myaddress",
"fields":[
{"name":"street","type":"string"},
{"name":"number","type":"string"},
{"name":"districts","type":"string"},
{"name":"country","type":"string"}
]
}
}
]}
I need the avro format couse i wanna start a producer whit kafka but when i start it, have a mistake when i enter the previos one record. Couse the avro schema does not match with the record. How make them match
Yeah Nitin Tripathi
{"type":"record","name":"myrecord","fields":[
{"name":"ID","type":"int"},
{"name":"name", "type": "string"},
{"name":"lastname", "type": "string"},
{"name":"age", "type": "int"},
{"name":"hobbies","type": {
"type": "array",
"items": {
"type":"record","name":"myhobbies",
"fields":[
{"name":"sports","type":{"type": "array", "items": "string"}}
]
} }
},
{"name":"address","type":{"type":"record","name":"myaddress",
"fields":[
{"name":"street","type":"string"},
{"name":"number","type":"string"},
{"name":"districts","type":"string"},
{"name":"country","type":"string"}
]
}
}
]}
I tried it, but it does not work :(

Arrays use the type name "array" and support a single attribute, however, the schema for hobbies mixes string and myhobbies type

Related

Issues using MinItem and uniqueItem: true to generate multiple enums

I am trying to create a JSON schema that validates a JSON object which can have more than 1 enum.
I have successfully created a JSON schema that is expected to validate a JSON object but the problem I am having is ensuring that the schema accepts more than one enum.
when I use the minItem : 3, three enums are generated but they are duplicates. when I add uniqueItem: true it only returns 1 enum instead of 3 unique enums.
My json schem:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"$id": "https://mrlee.com/player.schema.json",
"title": "The lee schema",
"type": "object",
"properties": {
"AccountID":{
"description": "The unique identifier for a user",
"type": "integer"
},
"Permissions":{
"type": "object",
"properties":{
"granted":{
"type": "array",
"items": {
"type":"string",
"enum":[
"canLoginWithPassword",
"canVerify",
"canGame",
"canJump"
]
},
"minItems" : 4,
"uniqueItems": true
},
"failedConditions":{
"type": "array",
"items":[
{"type":"object",
"properties":{
"name":{
"type": "string"
},
"description":{
"type": "string"
},
"details":{
"type": "object",
"properties":{
"reason":{
"type": "string"
}
},
"required": [ "reason"]
},
"denied":{
"type": "array",
"items":{
"type":"string",
"enum":[
"canLoginWithPassword",
"canLoginWithPassword",
"canVerify",
"canGame",
"canJump"
]
},
"minItems":1,
"maxItems": 5
}
},
"required": [ "name", "description", "details","denied" ] }
]
}
},
"required": [ "granted", "failedConditions" ]
}
},
"required": [ "AccountID", "Permissions"]
to test that my json schema works, I use this link. would be glad if im pointed in the right direction as I am new to json and json schema

JSON Schema reference resolution

I have a JSON schema that contains "$ref" tags and I am trying to get a version of the JSON schema that have the "$ref" tags resolved. I am only looking to resolve "$ref" from definition (tags) within the JSON Schema string (ie. not external resolution needed).
Is there a library that performs the resolution of the JSON Schema? (I am currently using org.everit.json.schema library, which is great, but I can't find how to do what I need).
For example, my original schema is:
{
"$id": "https://example.com/arrays.schema.json",
"description": "A representation of a person, company, organization, or place",
"title": "complex-schema",
"type": "object",
"properties": {
"fruits": {
"type": "array",
"items": {
"type": "string"
}
},
"vegetables": {
"type": "array",
"items": { "$ref": "#/$defs/veggie" }
}
},
"$defs": {
"veggie": {
"type": "object",
"required": [ "veggieName", "veggieLike" ],
"properties": {
"veggieName": {
"type": "string",
"description": "The name of the vegetable."
},
"veggieLike": {
"type": "boolean",
"description": "Do I like this vegetable?"
}
}
}
}
}
Which would resolve to something like this (notice that the "#defs/veggie" resolves to its definition inserted inline in the schema):
{
"$id": "https://example.com/arrays.schema.json",
"description": "A representation of a person, company, organization, or place",
"title": "complex-schema",
"type": "object",
"properties": {
"fruits": {
"type": "array",
"items": {
"type": "string"
}
},
"vegetables": {
"type": "array",
"items": {
"type": "object",
"required": [ "veggieName", "veggieLike" ],
"properties": {
"veggieName": {
"type": "string",
"description": "The name of the vegetable."
},
"veggieLike": {
"type": "boolean",
"description": "Do I like this vegetable?"
}
}
}
}
}
}
This isn't possible in the general sense, because:
the $ref might be recursive (i.e. reference itself again)
the keywords in the $ref might duplicate some of the keywords in the containing schema, which would cause some logic to be overwritten.
Why do you need to alter the schema in this way? Generally, a JSON Schema implementation will resolve the $refs automatically while evaluating the schema against provided data.

Json Schema different input formats

I'm creating some models in AWS API Gateway. I'm having problems with one that I wish it receives 2 input formats: one of the formats is just a dictionary the other is an array of dictionaries:
{
"id":"",
"name":""
}
and
[
{
"id":"",
"Family":""
},
{
"id":"",
"Family":""
},
...
{
"id":"",
"Family":""
}
]
Until now I've created the model to accept only the dictionary way:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "Update",
"type": "object",
"properties": {
"id": { "type": "string"},
"name": { "type": "string"}
},
"required": ["id"]
}
Can you give me some tips to create the array of dictionaries, please. I've done some research and I found nothing but I'm following the way of the keywords oneOf and anyOf but I'm not sure.
You're on the right track with anyOf. What you should do depends on the similarity between the object (dictionary) that's by itself and the object that's in the array. They look different in your example, so I'll answer in kind, then show how to simplify things if they are in fact the same.
To use anyOf, you want to capture the keywords that define your dictionary
{
"type": "object",
"properties": {
"id": { "type": "string"},
"name": { "type": "string"}
},
"required": ["id"]
}
and wrap that inside an anyOf right at the root level of the schema
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "Update",
"anyOf": [
{
"type": "object",
"properties": {
"id": { "type": "string"},
"name": { "type": "string"}
},
"required": ["id"]
}
]
}
To write a schema for an array of the same kind object, you need the items keyword.
{
"type": "array",
"items": {
"type": "object",
"properties": {
"id": { "type": "string"},
"Family": { "type": "string"}
},
"required": ["id"]
}
}
Add this in as a second element in the anyOf array, and you're golden.
If your lone object can have the same schema as your array-element object, then you can write that schema once as a definition and reference it in both places.
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "Update",
"definitions": {
"myObject": {
"type": "object",
"properties": {
"id": { "type": "string"},
"name": { "type": "string"}
},
"required": ["id"]
}
},
"anyOf": [
{ "$ref": "#/definitions/myObject" },
{
"type": "array",
"items": { "$ref": "#/definitions/myObject" }
}
]
}

JSON Schema reporting error only for first element of array

I have the below JSON document.
[
{
"name": "aaaa",
"data": {
"key": "id",
"value": "aaaa"
}
},
{
"name": "bbbb",
"data": {
"key": "id1",
"value": "bbbb"
}
}
]
Below is the JSON Schema I have created for the above content.
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "array",
"items": [
{
"type": "object",
"properties": {
"name": {
"type": "string"
},
"data": {
"type": "object",
"properties": {
"key": {
"type": "string",
"enum": [
"id",
"temp",
]
},
"value": {
"type": "string",
}
},
"required": [
"key",
"value"
]
}
},
"required": [
"name",
"data"
]
}
]
}
As per the schema, the value of data.key is invalid for second item in the array. but any online schema validator does not find that. If we use different value in first array element, it throws the excepted error.
I assume that my schema is wrong somehow. what I expect is that any child items of the array should be reported if they have values out of the enum list.
It's an easy mistake to make, so don't beat yourself up about this one!
items can be an array or an object. If it's an array, it validates the object at that position in the instance array. Here's an excerpt from the JSON Schema spec (draft-7)
The value of "items" MUST be either a valid JSON Schema or an array of
valid JSON Schemas.
If "items" is a schema, validation succeeds if all elements in the
array successfully validate against that schema.
If "items" is an array of schemas, validation succeeds if each element
of the instance validates against the schema at the same position, if
any.
JSON Schema (validation) draft-7 items
Removing the square braces provides you with the correct schema...
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "array",
"items":
{
"type": "object",
"properties": {
"name": {
"type": "string"
},
"data": {
"type": "object",
"properties": {
"key": {
"type": "string",
"enum": [
"id",
"temp",
]
},
"value": {
"type": "string",
}
},
"required": [
"key",
"value"
]
}
},
"required": [
"name",
"data"
]
}
}

How to define avro schema for complex json document?

I have a JSON document that I would like to convert to Avro and need a schema to be specified for that purpose. Here is the JSON document for which I would like to define the avro schema:
{
"uid": 29153333,
"somefield": "somevalue",
"options": [
{
"item1_lvl2": "a",
"item2_lvl2": [
{
"item1_lvl3": "x1",
"item2_lvl3": "y1"
},
{
"item1_lvl3": "x2",
"item2_lvl3": "y2"
}
]
}
]
}
I'm able to define the schema for the non-complex types but not for the complex "options" field:
{
"namespace" : "my.com.ns",
"type" : "record",
"fields" : [
{"name": "uid", "type": "int"},
{"name": "somefield", "type": "string"}
{"name": "options", "type": .....}
]
}
Thanks for the help!
You need to use Avro complex types, specifically arrays and records. And then nest these together:
{
"namespace" : "my.com.ns",
"name": "myrecord",
"type" : "record",
"fields" : [
{"name": "uid", "type": "int"},
{"name": "somefield", "type": "string"},
{"name": "options", "type": {
"type": "array",
"items": {
"type": "record",
"name": "lvl2_record",
"fields": [
{"name": "item1_lvl2", "type": "string"},
{"name": "item2_lvl2", "type": {
"type": "array",
"items": {
"type": "record",
"name": "lvl3_record",
"fields": [
{"name": "item1_lvl3", "type": "string"},
{"name": "item2_lvl3", "type": "string"}
]
}
}}
]
}
}}
]
}
Also, to improve readiblity, you can split the schema into multiple files.
This online tool (http://avro4s-ui.landoop.com/) is very practical, you can generate the AVRO schema by a given valid json.