json data mapping of avro union - json

I wonder if the incoming json payload, like this, json sample version 1
{
"name": "Alyssa",
"favorite_number": 7,
"favorite_color": "blue"
}
what's the easy way to add the Data Type and get the updated json data to accommodate the Avro Union requirement. json sample version 2
{
"name": "Alyssa",
"favorite_number": {
"int": 7
},
"favorite_color": null
}
The Avro Schema is :
"fields": [
{
"name": "name",
"type": {
"type": "string",
"avro.java.string": "String"
},
"default": "1234567890"
},
{
"name": "favorite_number",
"type": ["null", "Int"],
"doc": "The code for originating channel/consumer",
"default": null
},
{
"name": "favorite_color",
"type": ["null","String"],
"doc": "The communication language selected by customer.",
"default": null
}
]

Related

Kafka cp-server fails message validation on a broker side

I need to validate messages on a brokers side.
I run cp-server (merely ran cp-all-in-one compose file).
created a topic
set confluent.value.schema.validation to true
registered a schema (JSON)
produced a message
It always fails. Why validation fails? Should I change configuration?
Schema:
{
"$id": "http://example.com/models/data-item-definition.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"description": "test data item 1",
"properties": {
"array_val": {
"items": {
"type": "string"
},
"type": [
"array",
"null"
]
},
"int_val": {
"type": "integer"
},
"string_val": {
"type": "string"
}
},
"required": [
"string_val",
"int_val"
],
"title": "data item",
"type": "object"
}
Message:
{
"string_val": "text",
"int_val": 10,
"array_val": ["one", "two", "three"]
}
An issue is that if confluent.value.schema.validation is true then to produce a message we need to send a magic byte and a schema ID at the beginning.
See Wire Format.

Converting nested JSON into CSV by adding the nested objects to s single column in Apache Nifi

I have a nested object JSON structure as given below;
{
"Bikes":[
{
"Name":"KTM",
"Model":"2017",
"Colour":"Yellow"
}
{
"Name":"Yamaha",
"Model":"2020",
"Colour":"Black"
}
],
"Cars":[
{
"Name":"BMW",
"Model":"2017",
"Colour":"Yellow"
}
{
"Name":"Audi",
"Model":"2020",
"Colour":"Black"
}
]
My output CSV should look this;
Bikes (Column 1)
{
"Name":"KTM",
"Model":"2017",
"Colour":"Yellow"
}
{
"Name":"Yamaha",
"Model":"2020",
"Colour":"Black"
}
Cars (Columns 2)
{
"Name":"BMW",
"Model":"2017",
"Colour":"Yellow"
}
{
"Name":"Audi",
"Model":"2020",
"Colour":"Black"
}
'''
I need to store the entire object Bikes in a single columns and likewise Cars in a single column
I am currently using a convert record processor to convert from JSON to CSV, MY avro schema for both JSON and CSV looks like this
{
"name": "Sydney",
"type": "record",
"namespace": "sydney",
"fields": [
{
"name": "Bikes",
"type": {
"type": "array",
"items": {
"name": "Vehicle",
"type": "record",
"fields": [
{
"name": "Name",
"type": "string"
},
{
"name": "Model",
"type": "string"
},
{
"name": "Colour",
"type": "string"
}
]
}
}
},
{
"name": "Cars",
"type": {
"type": "array",
"items": {
"name": "Vehicle",
"type": "record",
"fields": [
{
"name": "Name",
"type": "string"
},
{
"name": "Model",
"type": "string"
},
{
"name": "Colour",
"type": "string"
}
]
}
}
}
]
}
but in the convert record processor I am getting this error
ConvertRecord[id=4d909c18-0177-1000-c1cd-9456a1775358] Failed to process StandardFlowFileRecord[uuid=63d04fc1-7edd-405b-8a9d-000bcdaa3d6c,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1611930760172-32, container=default, section=32], offset=744686, length=4],offset=0,name=test.json,size=4]; will route to failure: IOException thrown from ConvertRecord[id=4d909c18-0177-1000-c1cd-9456a1775358]: org.codehaus.jackson.JsonParseException: Unexpected character ('T' (code 84)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: org.apache.nifi.stream.io.NonCloseableInputStream#5b8393b6; line: 1, column: 2]
Could anyone help me out on this?
You probably want to use a jolt transform for this, which has its own processor. Restructuring a simple nested JSON like this is one of their documented examples.

Unique value for a property validation using json schema

I have a JSON object like:
{
"result": [
{
"name" : "abc",
"email": "abc.test#mail.com"
},
{
"name": "def",
"email": "def.test#mail.com"
},
{
"name": "xyz",
"email": "abc.test#mail.com"
}
]
}
and schema for this:
{
"definitions": {},
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://example.com/object1607582431.json",
"title": "Root",
"type": "object",
"required": [
"result"
],
"properties": {
"result": {
"$id": "#root/result",
"title": "Result",
"type": "array",
"default": [],
"uniqueItems": true,
"items": {
"$id": "#root/result/items",
"title": "Items",
"type": "object",
"required": [
"name",
"email"
],
"properties": {
"name": {
"$id": "#root/result/items/name",
"title": "Name",
"type": "string"
},
"email": {
"$id": "#root/result/items/email",
"title": "Email",
"type": "string"
}
}
}
}
}
}
I am looking for an option to check uniqueness for email irrespective of name. How I can validate that every email should be unique?
You can't. There are no keywords that let you compare one particular data value against another, other than uniqueItems, which compares an array element in toto against another.
The JsonSchema specification does not currently support this.
You can see the active GitHub issue here: https://github.com/json-schema-org/json-schema-vocabularies/issues/22
However, there are various extensions of JsonSchema that do validate unique fields within lists of objects.
If you happen to be using Python you can use the package (I created) JsonVL. It can be installed with pip install jsonvl and then run with jsonvl data.json schema.json.
Code examples in the GitHub repo: https://github.com/gregorybchris/jsonvl

Avro Schema format Exception - “SecurityClassification” is not a defined name

I'm trying to use this avro schema
{
"type": "record",
"name": "ComplianceEntity",
"namespace": "com.linkedin.events.metadata",
"fields": [
{
"name": "fieldPath",
"type": "string"
},
{
"name": "complianceDataType",
"type": {
"type": "enum",
"name": "ComplianceDataType",
"symbols": [
"NONE",
"MEMBER_ID"
],
"symbolDocs": {
"NONE": "None of the following types apply",
"MEMBER_ID": "ID for LinkedIn members"
}
}
},
{
"name": "complianceDataTypeUrn",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "fieldFormat",
"type": [
"null",
{
"type": "enum",
"name": "FieldFormat",
"symbols": [
"NUMERIC"
],
"symbolDocs": {
"NUMERIC": "Numerical format, 12345"
},
"doc": "The field format"
}
]
},
{
"name": "securityClassification",
"type": "SecurityClassification"
},
{
"name": "valuePattern",
"default": null,
"type": [
"null",
"string"
]
}
]
}
To generate and avro file using the avro-tools:
java -jar ./avro-tools-1.8.2.jar compile schema ComplianceEntity.avsc .
But I am getting the following error message:
Exception in thread "main" org.apache.avro.SchemaParseException: "SecurityClassification" is not a defined name. The type of the "securityClassification" field must be a defined name or a {"type": ...} expression.
Could anyone tell, why SecurityClassification is not identified as a defined name?
You are using it as type of your field, however you are not defining it properly like for complianceDataType, that's the reason why you are getting the avro exception
{
"name": "securityClassification",
"type": "SecurityClassification"
}
Make sure that if you have more than 1 Schema, you pass all of them, especially dependency schemas. It is supported from AVRO 1.5.3 https://issues.apache.org/jira/browse/AVRO-877.
java -jar ./avro-tools-1.8.2.jar compile schema SecurityClassification.avsc ComplianceEntity.avsc .

avro runtime exception not a map when return in Json format

i have a avro schema for UKRecord, which contain a list of CMRecord(also avro schemaed):
{
"namespace": "com.uhdyi.hi.avro",
"type": "record",
"name": "UKRecord",
"fields": [
{
"name": "coupon",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "cm",
"type": [
"null",
{
"type": "array",
"items": {
"type": "record",
"name": "CmRecord",
"fields": [
{
"name": "id",
"type": "string",
"default": ""
},
{
"name": "name",
"type": "string",
"default": ""
}
]
}
}
],
"default": null
}
]
}
in my java code, i create a UKRecord which has all fields populated correctly, eventually i need to return this object using a json based api, however it complained:
org.apache.avro.AvroRuntimeException: Not a map: {"type":"record","name":"CmRecord","namespace":"com.uhdyi.hi.avro","fields":[{"name":"id","type":"string","default":""},{"name":"name","type":"string","default":""}]}
the java code that write the object to json is :
ObjectWriter writer = ObjectMapper.writer();
if (obj != null) {
response.setHeader(HttpHeaders.Names.CONTENT_TYPE, "application/json; charset=UTF-8");
byte[] bytes = writer.writeValueAsBytes(obj); <-- failed here
...
}
obj is:
{"coupon": "c12345", "cm": [{"id": "1", "name": "name1"}, {"id": "2", "name": "name2"}]}
why do i get this error? please help!
Because you are using unions, Avro is uncertain how to interpret the JSON you are providing. Here's how you can change the JSON so Avro knows it's not null
{
"coupon": { "string": "c12345" },
"cm": { "array": [
{ "id": "1", "name": "name1" },
{ "id": "2", "name": "name2" }
]
}
}
I know, it's really annoying how Avro chose to handle nulls.