How to write a JSON into a Avro Schema in NiFi - json

I want to write a Avro schema. The JSON is like this.
{
"manager": {
"employeeId": "ref-456",
"name": "John Doe"
}
}
This is the schema I written. But it's wrong. How can I change it to the right thing?
{
"namespace":"nifi",
"name":"store_event",
"type":"record",
"fields":[ {
"name" : "manager",
"type" : [
{"name":"employeeId", "type":"string"},
{"name":"name", "type":"string"}
]
}
]
}

Here it is :
{
"type": "record",
"name": "nifiRecord",
"namespace": "org.apache.nifi",
"fields": [
{
"name": "manager",
"type": [
"null",
{
"type": "record",
"name": "managerType",
"fields": [
{
"name": "employeeId",
"type": [
"null",
"string"
]
},
{
"name": "name",
"type": [
"null",
"string"
]
}
]
}
]
}
]
}
You can actually infer the schema quite easy(i did this using your json payload)
Use ConvertRecord with a JsonTreeReader(Infer Schema) + JsonTreeSetWritter (Set Avro.Schema Attribute - this will tell you the schema)

Related

Unable to create sample data for avro schema Error creating a kafka message to producer - Expected start-union. Got VALUE_STRING

Unable to Error creating a kafka message to producer - Expected start-union. Got VALUE_STRING
{
"namespace": "de.morris.audit",
"type": "record",
"name": "AuditDataChangemorris",
"fields": [
{"name": "employeeID", "type": "string"},
{"name": "employeeNumber", "type": ["null", "string"], "default": null},
{"name": "serialNumbers", "type": [ "null", {"type": "array", "items": "string"}]},
{"name": "correlationId", "type": "string"},
{"name": "timestamp", "type": "long", "logicalType": "timestamp-millis"},
{"name": "employmentscreening","type":{"type": "enum", "name": "employmentscreening", "symbols": ["NO","YES"]}},
{"name": "vouchercodes","type": ["null",
{
"type": "array",
"items": {
"name": "Vouchercodes",
"type": "record",
"fields": [
{"name": "voucherName","type": ["null","string"], "default": null},
{"name": "authocode","type": ["null","string"], "default": null}
]
}
}], "default": null}
]
}
when i was trying to create a sample data in json format based on the above avsc for kafka consumer i am getting the below error upon testing
{
"employeeID": "qtete46524",
"employeeNumber": {
"string": "custnumber9813"
},
"serialNumbers": {
"type": "array",
"items": ["363536623","5846373733"]
},
"correlationId": "corr-656532443",
"timestamp": 1476538955719,
"employmentscreening": "NO",
"vouchercodes": [
{
"voucherName": "skygo",
"authocode": "A238472ASD"
}
]
}
getting the below error when i got when i ran the dataflow job in gcp
Error message from worker: java.lang.RuntimeException: java.io.IOException: Insert failed: [{"errors":[{"debugInfo":"","location":"serialnumbers","message":"Array specified for non-repeated field: serialnumbers.","reason":"invalid"}],"index":0}]**
how to create correct sample data based on the above schema ?
Read the spec
The value of a union is encoded in JSON as follows:
if its type is null, then it is encoded as a JSON null;
otherwise it is encoded as a JSON object with one name/value pair whose name is the type’s name and whose value is the recursively encoded value
So, here's the data it expects.
{
"employeeID": "qtete46524",
"employeeNumber": {
"string": "custnumber9813"
},
"serialNumbers": {"array": [
"serialNumbers3521"
]},
"correlationId": "corr-656532443",
"timestamp": 1476538955719,
"employmentscreening": "NO",
"vouchercodes": {"array": [
{
"voucherName": {"string": "skygo"},
"authocode": {"string": "A238472ASD"}
}
]}
}
With this schema
{
"namespace": "de.morris.audit",
"type": "record",
"name": "AuditDataChangemorris",
"fields": [
{
"name": "employeeID",
"type": "string"
},
{
"name": "employeeNumber",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "serialNumbers",
"type": [
"null",
{
"type": "array",
"items": "string"
}
]
},
{
"name": "correlationId",
"type": "string"
},
{
"name": "timestamp",
"type": {
"type": "long",
"logicalType": "timestamp-millis"
}
},
{
"name": "employmentscreening",
"type": {
"type": "enum",
"name": "employmentscreening",
"symbols": [
"NO",
"YES"
]
}
},
{
"name": "vouchercodes",
"type": [
"null",
{
"type": "array",
"items": {
"name": "Vouchercodes",
"type": "record",
"fields": [
{
"name": "voucherName",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "authocode",
"type": [
"null",
"string"
],
"default": null
}
]
}
}
],
"default": null
}
]
}
Here's an example of producing and consuming to Kafka
$ jq -rc < /tmp/data.json | kafka-avro-console-producer --topic foobar --property value.schema="$(jq -rc < /tmp/data.avsc)" --bootstrap-server localhost:9092 --sync
$ kafka-avro-console-consumer --topic foobar --from-beginning --bootstrap-server localhost:9092 | jq
{
"employeeID": "qtete46524",
"employeeNumber": {
"string": "custnumber9813"
},
"serialNumbers": {
"array": [
"serialNumbers3521"
]
},
"correlationId": "corr-656532443",
"timestamp": 1476538955719,
"employmentscreening": "NO",
"vouchercodes": {
"array": [
{
"voucherName": {
"string": "skygo"
},
"authocode": {
"string": "A238472ASD"
}
}
]
}
}
^CProcessed a total of 1 messages

Dynamic avro schema creation

How to create avro schema for below json code
{
"_id" : "xxxx",
"name" : "xyz",
"data" : {
"abc" : {
"0001" : "gha",
"0002" : "bha"
}
}
}
Here,
"0001" : "gha",
"0002" : "bha"
key: value would be dynamic.
Maybe a schema like this does what you want?
{
"type": "record",
"name": "MySchema",
"namespace": "my.name.space",
"fields": [
{
"name": "_id",
"type": "string"
},
{
"name": "name",
"type": "string"
},
{
"name": "data",
"type": {
"type": "record",
"name": "Data",
"fields": [
{
"name": "abc",
"type": {
"type": "map",
"values": "string"
}
}
]
}
}
]
}
It's not dynamic, but you can add as many key-value pairs to the map as you like. Field names starting with a numeric value aren't allowed in Avro.

Json schema validation for object which may have two forms

I'm trying to figure out how to validate a JSON object which may have 2 forms.
for examples
when there is no data available, the JSON could be
{
"student": {}
}
when there is data available, the JSON could be
{
"student":{
"id":"someid",
"name":"some name",
"age":15
}
}
I wrote the JSON schema in this way, but it seems not working
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "http://json-schema.org/draft-07/schema#",
"title": "JSON schema validation",
"properties": {
"student": {
"type": "object",
"oneOf": [
{
"required": [],
"properties": {}
},
{
"required": [
"id",
"name",
"age"
],
"properties": {
"id": {
"$id": "#/properties/student/id",
"type": [
"string",
"null"
]
},
"name": {
"$id": "#/properties/student/name",
"type": [
"string",
"null"
]
},
"age": {
"$id": "#/properties/student/age",
"type": [
"number"
]
}
}
}
]
}
}
}
I was wondering is there a way to validate it. Thank you!
An empty properties object, and an empty required object, do nothing.
JSON Schema is constraints based, in that if you don't explicitly constrain what is allowed, then it is allowed by default.
You're close, but not quite. The const keyword can take any value, and is what you want in your first item of allOf.
You can test the following schema here: https://jsonschema.dev/s/Kz1C0
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "https://example.com/myawesomeschema",
"title": "JSON schema validation",
"properties": {
"student": {
"type": "object",
"oneOf": [
{
"const": {}
},
{
"required": [
"id",
"name",
"age"
],
"properties": {
"id": {
"type": [
"string",
"null"
]
},
"name": {
"type": [
"string",
"null"
]
},
"age": {
"type": [
"number"
]
}
}
}
]
}
}
}

avro runtime exception not a map when return in Json format

i have a avro schema for UKRecord, which contain a list of CMRecord(also avro schemaed):
{
"namespace": "com.uhdyi.hi.avro",
"type": "record",
"name": "UKRecord",
"fields": [
{
"name": "coupon",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "cm",
"type": [
"null",
{
"type": "array",
"items": {
"type": "record",
"name": "CmRecord",
"fields": [
{
"name": "id",
"type": "string",
"default": ""
},
{
"name": "name",
"type": "string",
"default": ""
}
]
}
}
],
"default": null
}
]
}
in my java code, i create a UKRecord which has all fields populated correctly, eventually i need to return this object using a json based api, however it complained:
org.apache.avro.AvroRuntimeException: Not a map: {"type":"record","name":"CmRecord","namespace":"com.uhdyi.hi.avro","fields":[{"name":"id","type":"string","default":""},{"name":"name","type":"string","default":""}]}
the java code that write the object to json is :
ObjectWriter writer = ObjectMapper.writer();
if (obj != null) {
response.setHeader(HttpHeaders.Names.CONTENT_TYPE, "application/json; charset=UTF-8");
byte[] bytes = writer.writeValueAsBytes(obj); <-- failed here
...
}
obj is:
{"coupon": "c12345", "cm": [{"id": "1", "name": "name1"}, {"id": "2", "name": "name2"}]}
why do i get this error? please help!
Because you are using unions, Avro is uncertain how to interpret the JSON you are providing. Here's how you can change the JSON so Avro knows it's not null
{
"coupon": { "string": "c12345" },
"cm": { "array": [
{ "id": "1", "name": "name1" },
{ "id": "2", "name": "name2" }
]
}
}
I know, it's really annoying how Avro chose to handle nulls.

How to define avro schema for complex json document?

I have a JSON document that I would like to convert to Avro and need a schema to be specified for that purpose. Here is the JSON document for which I would like to define the avro schema:
{
"uid": 29153333,
"somefield": "somevalue",
"options": [
{
"item1_lvl2": "a",
"item2_lvl2": [
{
"item1_lvl3": "x1",
"item2_lvl3": "y1"
},
{
"item1_lvl3": "x2",
"item2_lvl3": "y2"
}
]
}
]
}
I'm able to define the schema for the non-complex types but not for the complex "options" field:
{
"namespace" : "my.com.ns",
"type" : "record",
"fields" : [
{"name": "uid", "type": "int"},
{"name": "somefield", "type": "string"}
{"name": "options", "type": .....}
]
}
Thanks for the help!
You need to use Avro complex types, specifically arrays and records. And then nest these together:
{
"namespace" : "my.com.ns",
"name": "myrecord",
"type" : "record",
"fields" : [
{"name": "uid", "type": "int"},
{"name": "somefield", "type": "string"},
{"name": "options", "type": {
"type": "array",
"items": {
"type": "record",
"name": "lvl2_record",
"fields": [
{"name": "item1_lvl2", "type": "string"},
{"name": "item2_lvl2", "type": {
"type": "array",
"items": {
"type": "record",
"name": "lvl3_record",
"fields": [
{"name": "item1_lvl3", "type": "string"},
{"name": "item2_lvl3", "type": "string"}
]
}
}}
]
}
}}
]
}
Also, to improve readiblity, you can split the schema into multiple files.
This online tool (http://avro4s-ui.landoop.com/) is very practical, you can generate the AVRO schema by a given valid json.