Dynamic avro schema creation - json

How to create avro schema for below json code
{
"_id" : "xxxx",
"name" : "xyz",
"data" : {
"abc" : {
"0001" : "gha",
"0002" : "bha"
}
}
}
Here,
"0001" : "gha",
"0002" : "bha"
key: value would be dynamic.

Maybe a schema like this does what you want?
{
"type": "record",
"name": "MySchema",
"namespace": "my.name.space",
"fields": [
{
"name": "_id",
"type": "string"
},
{
"name": "name",
"type": "string"
},
{
"name": "data",
"type": {
"type": "record",
"name": "Data",
"fields": [
{
"name": "abc",
"type": {
"type": "map",
"values": "string"
}
}
]
}
}
]
}
It's not dynamic, but you can add as many key-value pairs to the map as you like. Field names starting with a numeric value aren't allowed in Avro.

Related

How to add a list to Elasticsearch mapping

I have the following JSON format and want to create a mapping for it from Elasticsearch console :
{
"properties": {
"#timestamp" : {
"type" : "date"
},
"list": [
{
"name": "John",
"age": "37",
"title": "Tester"
}
]
}
}
There's no list or array type in ES, you simply declare objects and then you can add a list of those objects to your documents:
PUT your-index
{
"mappings": {
"properties": {
"#timestamp" : {
"type" : "date"
},
"list": {
"type": "object",
"properties": {
"name": {
"type": "text"
},
"age": {
"type": "integer"
},
"title": {
"type": "text"
}
}
}
}
}
}

How to validate the json against a complex json schema which has multi level of references

I have a json schema as follows
{
"$id": "https://example.com/arrays.schema.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"description": "A representation of a person, company, organization, or place",
"type": "object",
"properties": {
"fruits": {
"type": "array",
"items": {
"type": "string"
}
},
"vegetables": {
"type": "array",
"items": { "$ref": "#/$defs/veggie" }
}
},
"required" : ["fruits", "vegetables"],
"$defs": {
"veggie": {
"type": "object",
"required": [ "veggieName", "veggieLike", "cropLocation"],
"properties": {
"veggieName": {
"type": "string",
},
"veggieLike": {
"type": "boolean",
},
"cropLocation" : {
"type" : "object",
"items" :{ "$ref" : "#/$defs/location"}
}
}
},
"location" : {
"type" : "object",
"required" : ["country", "state"],
"properties" : {
"country" : {
"type": "string"
},
"state" : {
"type": "string"
}
}
}
}
}
When I give data as follows I am expecting error that cropLocation doesnt have state and country prperty. but it validates as success against that schema. How to define schema with multilevel complex objects.
"fruits": [ "apple", "orange", "pear" ],
"vegetables": [
{
"veggieName" : "carrot",
"veggieLike": true,
"cropLocation" : {}
},
{
"veggieName": "broccoli",
"veggieLike": false,
"cropLocation" : {}
}
]
}
I tried variuos ways to restrucutre the json schema, but is not working
The error is here:
"items" :{ "$ref" : "#/$defs/location"}
"cropLocation" is an object, but items is a keyword that only applies to arrays. Simply remove the items keyword and make the $ref a sibling to "type": "object".

How to write a JSON into a Avro Schema in NiFi

I want to write a Avro schema. The JSON is like this.
{
"manager": {
"employeeId": "ref-456",
"name": "John Doe"
}
}
This is the schema I written. But it's wrong. How can I change it to the right thing?
{
"namespace":"nifi",
"name":"store_event",
"type":"record",
"fields":[ {
"name" : "manager",
"type" : [
{"name":"employeeId", "type":"string"},
{"name":"name", "type":"string"}
]
}
]
}
Here it is :
{
"type": "record",
"name": "nifiRecord",
"namespace": "org.apache.nifi",
"fields": [
{
"name": "manager",
"type": [
"null",
{
"type": "record",
"name": "managerType",
"fields": [
{
"name": "employeeId",
"type": [
"null",
"string"
]
},
{
"name": "name",
"type": [
"null",
"string"
]
}
]
}
]
}
]
}
You can actually infer the schema quite easy(i did this using your json payload)
Use ConvertRecord with a JsonTreeReader(Infer Schema) + JsonTreeSetWritter (Set Avro.Schema Attribute - this will tell you the schema)

Bigquery importing map's json data into a table

I have a json file which has data from a pojo, which is basically
String
Map<String, Set<OBJ>>
=> where object has
int
int
string
timestamp
Here is a sample row:
{"key":"123qwe123","mapData":{"3539":[{"id":36,"type":1,"os":"WINDOWS","lastSeenDate":"2015-06-03 22:46:38 UTC"}],"16878":[{"id":36,"type":1,"os":"WINDOWS","lastSeenDate":"2015-06-03 22:26:34 UTC"}],"17312":[{"id":36,"type":1,"os":"WINDOWS","lastSeenDate":"2015-06-03 22:26:48 UTC"}]}}
I tried to do following schema, but thats not working:
[
{
"name" : "key",
"type" : "string"
},
{
"name" : "mapData",
"type" : "record",
"mode": "repeated",
"fields": [
{
"name": "some_id",
"type": "record",
"mode" : "repeated",
"fields" : [
{
"name": "id",
"type": "integer",
"mode": "nullable"
},
{
"name": "type",
"type": "integer",
"mode": "nullable"
},
{
"name": "os",
"type": "string",
"mode": "nullable"
},
{
"name": "lastSeenDate",
"type": "timestamp",
"mode": "nullable"
}
]
}
] } ]
When i run i get: repeated record must be imported as a JSON array
I know something is up with schema but not figured out yet.

avro runtime exception not a map when return in Json format

i have a avro schema for UKRecord, which contain a list of CMRecord(also avro schemaed):
{
"namespace": "com.uhdyi.hi.avro",
"type": "record",
"name": "UKRecord",
"fields": [
{
"name": "coupon",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "cm",
"type": [
"null",
{
"type": "array",
"items": {
"type": "record",
"name": "CmRecord",
"fields": [
{
"name": "id",
"type": "string",
"default": ""
},
{
"name": "name",
"type": "string",
"default": ""
}
]
}
}
],
"default": null
}
]
}
in my java code, i create a UKRecord which has all fields populated correctly, eventually i need to return this object using a json based api, however it complained:
org.apache.avro.AvroRuntimeException: Not a map: {"type":"record","name":"CmRecord","namespace":"com.uhdyi.hi.avro","fields":[{"name":"id","type":"string","default":""},{"name":"name","type":"string","default":""}]}
the java code that write the object to json is :
ObjectWriter writer = ObjectMapper.writer();
if (obj != null) {
response.setHeader(HttpHeaders.Names.CONTENT_TYPE, "application/json; charset=UTF-8");
byte[] bytes = writer.writeValueAsBytes(obj); <-- failed here
...
}
obj is:
{"coupon": "c12345", "cm": [{"id": "1", "name": "name1"}, {"id": "2", "name": "name2"}]}
why do i get this error? please help!
Because you are using unions, Avro is uncertain how to interpret the JSON you are providing. Here's how you can change the JSON so Avro knows it's not null
{
"coupon": { "string": "c12345" },
"cm": { "array": [
{ "id": "1", "name": "name1" },
{ "id": "2", "name": "name2" }
]
}
}
I know, it's really annoying how Avro chose to handle nulls.