How to define avro schema for complex json document? - json

I have a JSON document that I would like to convert to Avro and need a schema to be specified for that purpose. Here is the JSON document for which I would like to define the avro schema:
{
"uid": 29153333,
"somefield": "somevalue",
"options": [
{
"item1_lvl2": "a",
"item2_lvl2": [
{
"item1_lvl3": "x1",
"item2_lvl3": "y1"
},
{
"item1_lvl3": "x2",
"item2_lvl3": "y2"
}
]
}
]
}
I'm able to define the schema for the non-complex types but not for the complex "options" field:
{
"namespace" : "my.com.ns",
"type" : "record",
"fields" : [
{"name": "uid", "type": "int"},
{"name": "somefield", "type": "string"}
{"name": "options", "type": .....}
]
}
Thanks for the help!

You need to use Avro complex types, specifically arrays and records. And then nest these together:
{
"namespace" : "my.com.ns",
"name": "myrecord",
"type" : "record",
"fields" : [
{"name": "uid", "type": "int"},
{"name": "somefield", "type": "string"},
{"name": "options", "type": {
"type": "array",
"items": {
"type": "record",
"name": "lvl2_record",
"fields": [
{"name": "item1_lvl2", "type": "string"},
{"name": "item2_lvl2", "type": {
"type": "array",
"items": {
"type": "record",
"name": "lvl3_record",
"fields": [
{"name": "item1_lvl3", "type": "string"},
{"name": "item2_lvl3", "type": "string"}
]
}
}}
]
}
}}
]
}
Also, to improve readiblity, you can split the schema into multiple files.

This online tool (http://avro4s-ui.landoop.com/) is very practical, you can generate the AVRO schema by a given valid json.

Related

Unable to create sample data for avro schema Error creating a kafka message to producer - Expected start-union. Got VALUE_STRING

Unable to Error creating a kafka message to producer - Expected start-union. Got VALUE_STRING
{
"namespace": "de.morris.audit",
"type": "record",
"name": "AuditDataChangemorris",
"fields": [
{"name": "employeeID", "type": "string"},
{"name": "employeeNumber", "type": ["null", "string"], "default": null},
{"name": "serialNumbers", "type": [ "null", {"type": "array", "items": "string"}]},
{"name": "correlationId", "type": "string"},
{"name": "timestamp", "type": "long", "logicalType": "timestamp-millis"},
{"name": "employmentscreening","type":{"type": "enum", "name": "employmentscreening", "symbols": ["NO","YES"]}},
{"name": "vouchercodes","type": ["null",
{
"type": "array",
"items": {
"name": "Vouchercodes",
"type": "record",
"fields": [
{"name": "voucherName","type": ["null","string"], "default": null},
{"name": "authocode","type": ["null","string"], "default": null}
]
}
}], "default": null}
]
}
when i was trying to create a sample data in json format based on the above avsc for kafka consumer i am getting the below error upon testing
{
"employeeID": "qtete46524",
"employeeNumber": {
"string": "custnumber9813"
},
"serialNumbers": {
"type": "array",
"items": ["363536623","5846373733"]
},
"correlationId": "corr-656532443",
"timestamp": 1476538955719,
"employmentscreening": "NO",
"vouchercodes": [
{
"voucherName": "skygo",
"authocode": "A238472ASD"
}
]
}
getting the below error when i got when i ran the dataflow job in gcp
Error message from worker: java.lang.RuntimeException: java.io.IOException: Insert failed: [{"errors":[{"debugInfo":"","location":"serialnumbers","message":"Array specified for non-repeated field: serialnumbers.","reason":"invalid"}],"index":0}]**
how to create correct sample data based on the above schema ?
Read the spec
The value of a union is encoded in JSON as follows:
if its type is null, then it is encoded as a JSON null;
otherwise it is encoded as a JSON object with one name/value pair whose name is the type’s name and whose value is the recursively encoded value
So, here's the data it expects.
{
"employeeID": "qtete46524",
"employeeNumber": {
"string": "custnumber9813"
},
"serialNumbers": {"array": [
"serialNumbers3521"
]},
"correlationId": "corr-656532443",
"timestamp": 1476538955719,
"employmentscreening": "NO",
"vouchercodes": {"array": [
{
"voucherName": {"string": "skygo"},
"authocode": {"string": "A238472ASD"}
}
]}
}
With this schema
{
"namespace": "de.morris.audit",
"type": "record",
"name": "AuditDataChangemorris",
"fields": [
{
"name": "employeeID",
"type": "string"
},
{
"name": "employeeNumber",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "serialNumbers",
"type": [
"null",
{
"type": "array",
"items": "string"
}
]
},
{
"name": "correlationId",
"type": "string"
},
{
"name": "timestamp",
"type": {
"type": "long",
"logicalType": "timestamp-millis"
}
},
{
"name": "employmentscreening",
"type": {
"type": "enum",
"name": "employmentscreening",
"symbols": [
"NO",
"YES"
]
}
},
{
"name": "vouchercodes",
"type": [
"null",
{
"type": "array",
"items": {
"name": "Vouchercodes",
"type": "record",
"fields": [
{
"name": "voucherName",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "authocode",
"type": [
"null",
"string"
],
"default": null
}
]
}
}
],
"default": null
}
]
}
Here's an example of producing and consuming to Kafka
$ jq -rc < /tmp/data.json | kafka-avro-console-producer --topic foobar --property value.schema="$(jq -rc < /tmp/data.avsc)" --bootstrap-server localhost:9092 --sync
$ kafka-avro-console-consumer --topic foobar --from-beginning --bootstrap-server localhost:9092 | jq
{
"employeeID": "qtete46524",
"employeeNumber": {
"string": "custnumber9813"
},
"serialNumbers": {
"array": [
"serialNumbers3521"
]
},
"correlationId": "corr-656532443",
"timestamp": 1476538955719,
"employmentscreening": "NO",
"vouchercodes": {
"array": [
{
"voucherName": {
"string": "skygo"
},
"authocode": {
"string": "A238472ASD"
}
}
]
}
}
^CProcessed a total of 1 messages

Issues using MinItem and uniqueItem: true to generate multiple enums

I am trying to create a JSON schema that validates a JSON object which can have more than 1 enum.
I have successfully created a JSON schema that is expected to validate a JSON object but the problem I am having is ensuring that the schema accepts more than one enum.
when I use the minItem : 3, three enums are generated but they are duplicates. when I add uniqueItem: true it only returns 1 enum instead of 3 unique enums.
My json schem:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"$id": "https://mrlee.com/player.schema.json",
"title": "The lee schema",
"type": "object",
"properties": {
"AccountID":{
"description": "The unique identifier for a user",
"type": "integer"
},
"Permissions":{
"type": "object",
"properties":{
"granted":{
"type": "array",
"items": {
"type":"string",
"enum":[
"canLoginWithPassword",
"canVerify",
"canGame",
"canJump"
]
},
"minItems" : 4,
"uniqueItems": true
},
"failedConditions":{
"type": "array",
"items":[
{"type":"object",
"properties":{
"name":{
"type": "string"
},
"description":{
"type": "string"
},
"details":{
"type": "object",
"properties":{
"reason":{
"type": "string"
}
},
"required": [ "reason"]
},
"denied":{
"type": "array",
"items":{
"type":"string",
"enum":[
"canLoginWithPassword",
"canLoginWithPassword",
"canVerify",
"canGame",
"canJump"
]
},
"minItems":1,
"maxItems": 5
}
},
"required": [ "name", "description", "details","denied" ] }
]
}
},
"required": [ "granted", "failedConditions" ]
}
},
"required": [ "AccountID", "Permissions"]
to test that my json schema works, I use this link. would be glad if im pointed in the right direction as I am new to json and json schema

How to write a JSON into a Avro Schema in NiFi

I want to write a Avro schema. The JSON is like this.
{
"manager": {
"employeeId": "ref-456",
"name": "John Doe"
}
}
This is the schema I written. But it's wrong. How can I change it to the right thing?
{
"namespace":"nifi",
"name":"store_event",
"type":"record",
"fields":[ {
"name" : "manager",
"type" : [
{"name":"employeeId", "type":"string"},
{"name":"name", "type":"string"}
]
}
]
}
Here it is :
{
"type": "record",
"name": "nifiRecord",
"namespace": "org.apache.nifi",
"fields": [
{
"name": "manager",
"type": [
"null",
{
"type": "record",
"name": "managerType",
"fields": [
{
"name": "employeeId",
"type": [
"null",
"string"
]
},
{
"name": "name",
"type": [
"null",
"string"
]
}
]
}
]
}
]
}
You can actually infer the schema quite easy(i did this using your json payload)
Use ConvertRecord with a JsonTreeReader(Infer Schema) + JsonTreeSetWritter (Set Avro.Schema Attribute - this will tell you the schema)

JSON Schema reporting error only for first element of array

I have the below JSON document.
[
{
"name": "aaaa",
"data": {
"key": "id",
"value": "aaaa"
}
},
{
"name": "bbbb",
"data": {
"key": "id1",
"value": "bbbb"
}
}
]
Below is the JSON Schema I have created for the above content.
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "array",
"items": [
{
"type": "object",
"properties": {
"name": {
"type": "string"
},
"data": {
"type": "object",
"properties": {
"key": {
"type": "string",
"enum": [
"id",
"temp",
]
},
"value": {
"type": "string",
}
},
"required": [
"key",
"value"
]
}
},
"required": [
"name",
"data"
]
}
]
}
As per the schema, the value of data.key is invalid for second item in the array. but any online schema validator does not find that. If we use different value in first array element, it throws the excepted error.
I assume that my schema is wrong somehow. what I expect is that any child items of the array should be reported if they have values out of the enum list.
It's an easy mistake to make, so don't beat yourself up about this one!
items can be an array or an object. If it's an array, it validates the object at that position in the instance array. Here's an excerpt from the JSON Schema spec (draft-7)
The value of "items" MUST be either a valid JSON Schema or an array of
valid JSON Schemas.
If "items" is a schema, validation succeeds if all elements in the
array successfully validate against that schema.
If "items" is an array of schemas, validation succeeds if each element
of the instance validates against the schema at the same position, if
any.
JSON Schema (validation) draft-7 items
Removing the square braces provides you with the correct schema...
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "array",
"items":
{
"type": "object",
"properties": {
"name": {
"type": "string"
},
"data": {
"type": "object",
"properties": {
"key": {
"type": "string",
"enum": [
"id",
"temp",
]
},
"value": {
"type": "string",
}
},
"required": [
"key",
"value"
]
}
},
"required": [
"name",
"data"
]
}
}

How to define avro schema for complex json document?

THIS IS AN EXAMPLE OF MY JSON:
{"ID":2,"name":"Donatello","lastname":"Di Niccoló","age":23,"hobbies":["reading","dancing",{"sports":["rafting","baseball"]}],"address":{"street":"Tepito", "number":"77", "districts":"Benito Juárez", "country": "CDMX"}}
THIS IS MY AVRO SCHEMA
{"type":"record","name":"myrecord","fields":[
{"name":"ID","type":"int"},
{"name":"name", "type": "string"},
{"name":"lastname", "type": "string"},
{"name":"age", "type": "int"},
{"name":"hobbies","type": {
"type": "array",
"items": {
"type": "array", "items": "string",
"type":"record","name":"myhobbies",
"fields":[
{"name":"sports","type":{"type": "array", "items": "string"}}
]
} }
},
{"name":"address","type":{"type":"record","name":"myaddress",
"fields":[
{"name":"street","type":"string"},
{"name":"number","type":"string"},
{"name":"districts","type":"string"},
{"name":"country","type":"string"}
]
}
}
]}
I need the avro format couse i wanna start a producer whit kafka but when i start it, have a mistake when i enter the previos one record. Couse the avro schema does not match with the record. How make them match
Yeah Nitin Tripathi
{"type":"record","name":"myrecord","fields":[
{"name":"ID","type":"int"},
{"name":"name", "type": "string"},
{"name":"lastname", "type": "string"},
{"name":"age", "type": "int"},
{"name":"hobbies","type": {
"type": "array",
"items": {
"type":"record","name":"myhobbies",
"fields":[
{"name":"sports","type":{"type": "array", "items": "string"}}
]
} }
},
{"name":"address","type":{"type":"record","name":"myaddress",
"fields":[
{"name":"street","type":"string"},
{"name":"number","type":"string"},
{"name":"districts","type":"string"},
{"name":"country","type":"string"}
]
}
}
]}
I tried it, but it does not work :(
Arrays use the type name "array" and support a single attribute, however, the schema for hobbies mixes string and myhobbies type