AVRO schema for JSON - json

I have a JSON which gets generated like this. I wanted to know what would the avro schema for this would be. The number of keys values in array list is not fixed. There are related posts but they have the keys referenced and do not change. In my case the keys change. The names of the variable keys keeps on changing.
"fixedKey": [
{
"variableKey1": 2
},
{
"variableKey2": 1
},
{
"variableKey3": 3
},
.....
{
"variableKeyN" : 10
}
]

The schema should be something like this:
{
"type": "record",
"name": "test",
"fields": [
{
"name": "fixedKey",
"type": {
"type": "array",
"items": [
{"type": "map", "values": "int"},
],
},
}
],
}
Here's an example of serializing and deserializing your example data:
from io import BytesIO
from fastavro import writer, reader
schema = {
"type": "record",
"name": "test",
"fields": [
{
"name": "fixedKey",
"type": {
"type": "array",
"items": [
{"type": "map", "values": "int"},
],
},
}
],
}
records = [
{
"fixedKey": [
{
"variableKey1": 1,
},
{
"variableKey2": 2,
},
{
"variableKey3": 3,
},
]
}
]
bio = BytesIO()
writer(bio, schema, records)
bio.seek(0)
for record in reader(bio):
print(record)

Related

JSON Schema with Nested Objects with different properties

The entire JSON file is rather large so I've only taken out the subsection I've had an issue with.
{
"diagrams": {
"5f759d15cd046720c28531dd": {
"_id": "5f759d15cd046720c28531dd",
"offsetX": 320,
"offsetY": 42,
"zoom": 80,
"modified": 1604279356,
"nodes": {
"5f9f5c3ccd046720c28531e4": {
"nodeID": "5f9f5c3ccd046720c28531e4",
"type": "start",
"coords": [
360,
120
],
"data": {
"name": "Start",
"color": "standard",
"ports": [
{
"type": "",
"target": "5f9f5c3ccd046720c28531e6"
}
],
"steps": []
}
},
"5f9f5c3ccd046720c28531e5": {
"nodeID": "5f9f5c3ccd046720c28531e5",
"type": "block",
"coords": [
760,
120
],
"data": {
"name": "Help Message",
"color": "standard",
"steps": [
"5f9f5c3ccd046720c28531e6",
"5f9f5c3ccd046720c28531e7"
]
}
},
"5f9f5c3ccd046720c28531e6": {
"nodeID": "5f9f5c3ccd046720c28531e6",
"type": "speak",
"data": {
"randomize": false,
"dialogs": [
{
"voice": "Alexa",
"content": "You said help. Do you want to continue?"
}
],
"ports": [
{
"type": "",
"target": "5f9f5c3ccd046720c28531e7"
}
]
}
},
"5f9f5c3ccd046720c28531e7": {
"nodeID": "5f9f5c3ccd046720c28531e7",
"type": "interaction",
"data": {
"name": "Choice",
"else": {
"type": "path",
"randomize": false,
"reprompts": []
},
"choices": [
{
"intent": "",
"mappings": []
},
{
"intent": "",
"mappings": []
}
],
"reprompt": null,
"ports": [
{
"type": "else",
"target": null
},
{
"type": "",
"target": null
},
{
"type": "",
"target": "5f9f5c3ccd046720c28531e9"
}
]
}
},
"5f9f5c3ccd046720c28531e8": {
"nodeID": "5f9f5c3ccd046720c28531e8",
"type": "block",
"coords": [
1170,
260
],
"data": {
"name": "Exit",
"color": "standard",
"steps": [
"5f9f5c3ccd046720c28531e9"
]
}
},
"5f9f5c3ccd046720c28531e9": {
"nodeID": "5f9f5c3ccd046720c28531e9",
"type": "exit",
"data": {
"ports": []
}
}
},
"children": [],
"creatorID": 42661,
"variables": [],
"name": "Help Flow",
"versionID": "5f759d15cd046720c28531db"
}
}
}
The Current JSON Schema Definition I have is:
{
"$schema":"http://json-schema.org/schema#",
"type":"object",
"properties":{
"diagrams":{
"type":"object"
}
},
"required":[
"diagrams",
]
}
The problem I am having is that within diagrams contains multiple objects with a random string as the name e.g "5f759d15cd046720c28531dd".
Then within that object there are properties such as (_id, offsetX) which I want to express as well as a nodes object, which again contains multiple objects with arbitrary names e.g ("5f9f5c3ccd046720c28531e4", "5f9f5c3ccd046720c28531e5", ...) which have a unique node definition where some nodes have different properties to other nodes (nodeID, type, data vs nodeID, type, data, coords).
My question is with all these arbitrary things such as random names as well as different properties per each node. How do I turn it into 1 JSON schema definition which covers all the cases of how a diagram/node can be made.
You can do this with additionalProperties or patternProperties.
additionalProperties applies to any property that isn't declared in properties or patternProperties.
{
"type": "object",
"additionalProperties": {
"type": "object",
"properties": {
"_id": { ... },
"offsetX": { ... },
...
}
}
}
Your property names appear to always be hex numbers. If you want to enforce that those property names are always hex numbers, you can use patternProperties. Any property that matches the regex must conform to that schema.
{
"type": "object",
"patternProperties": {
"^[0-9a-f]{24}$": {
"type": "object",
"properties": {
"_id": { ... },
"offsetX": { ... },
...
}
}
},
"additionalProperties": false
}

Import JSON with objects as nested to Elastic Search

i've log with thousands records of aggregated data in JSON:
{
"count": 25,
"domain": "domain.tld",
"geoips": {
"AU": 5,
"NZ": 20
},
"ips": {
"1.2.3.4": 5,
"1.2.3.5": 1,
"1.2.3.6": 1,
"1.2.3.7": 1,
"1.2.3.8": 1,
"1.2.3.9": 9,
"1.2.3.10": 7
},
"subdomains": {
"a.domain.tld": 1,
"b.domain.tld": 1,
"c.domain.tld": 1,
"domain.tld": 22
},
"tld": "tld",
"types": {
"1": 3,
"43": 22
}
}
and i have mapping on ES:
"mappings": {
"properties": {
"count": {
"type": "long"
},
"domain": {
"type": "keyword"
},
"ips": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"val": {
"type": "long"
}
}
},
"geoips": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"val": {
"type": "long"
}
}
},
"subdomains": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"val": {
"type": "long"
}
}
},
"tld": {
"type": "keyword"
},
"types": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"val": {
"type": "long"
}
}
}
}
}
Is there any simple way how import these lines to ES as nested objects ? If i use a bulk insert without modification, the ES will modify mapping by adding a new field for each IP/subdomain/GeoIP instead add it as simple key/val object.
Or only one way is regenerate JSON to key/val nested fields ?
Your mapping is already very good but the data doesn't fit it since the nested data type expects an array of objects, not a single object. So you'll need to transform your nested objects into array of key-value pairs like so:
...
"ips": [
{
"key": "1.2.3.4",
"val": 5
},
{
"key": "1.2.3.5",
"val": 1
},
...
],
"subdomains": [
{
"key": "a.domain.tld",
"val": 1
},
{
"key": "b.domain.tld",
"val": 1
},
...
]
...

Json schema validation for object which may have two forms

I'm trying to figure out how to validate a JSON object which may have 2 forms.
for examples
when there is no data available, the JSON could be
{
"student": {}
}
when there is data available, the JSON could be
{
"student":{
"id":"someid",
"name":"some name",
"age":15
}
}
I wrote the JSON schema in this way, but it seems not working
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "http://json-schema.org/draft-07/schema#",
"title": "JSON schema validation",
"properties": {
"student": {
"type": "object",
"oneOf": [
{
"required": [],
"properties": {}
},
{
"required": [
"id",
"name",
"age"
],
"properties": {
"id": {
"$id": "#/properties/student/id",
"type": [
"string",
"null"
]
},
"name": {
"$id": "#/properties/student/name",
"type": [
"string",
"null"
]
},
"age": {
"$id": "#/properties/student/age",
"type": [
"number"
]
}
}
}
]
}
}
}
I was wondering is there a way to validate it. Thank you!
An empty properties object, and an empty required object, do nothing.
JSON Schema is constraints based, in that if you don't explicitly constrain what is allowed, then it is allowed by default.
You're close, but not quite. The const keyword can take any value, and is what you want in your first item of allOf.
You can test the following schema here: https://jsonschema.dev/s/Kz1C0
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "https://example.com/myawesomeschema",
"title": "JSON schema validation",
"properties": {
"student": {
"type": "object",
"oneOf": [
{
"const": {}
},
{
"required": [
"id",
"name",
"age"
],
"properties": {
"id": {
"type": [
"string",
"null"
]
},
"name": {
"type": [
"string",
"null"
]
},
"age": {
"type": [
"number"
]
}
}
}
]
}
}
}

At least one of the properties should have a certain value in json schema

I have 2 objects in json schema like below
"object1": {
"type": "number",
"enum": [
0,
1
]
},
"object2": {
"type": "number",
"enum": [
0,
1
]
}
At least one of object1 or object2 or both should be 1, how to achieve this criteria from json schema
This has been resolved using below
"anyOf": [
{
"properties": {
"object1": {
"enum": [
1
]
}
}
},
{
"properties": {
"object2": {
"enum": [
1
]
}
}
}
]

avro runtime exception not a map when return in Json format

i have a avro schema for UKRecord, which contain a list of CMRecord(also avro schemaed):
{
"namespace": "com.uhdyi.hi.avro",
"type": "record",
"name": "UKRecord",
"fields": [
{
"name": "coupon",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "cm",
"type": [
"null",
{
"type": "array",
"items": {
"type": "record",
"name": "CmRecord",
"fields": [
{
"name": "id",
"type": "string",
"default": ""
},
{
"name": "name",
"type": "string",
"default": ""
}
]
}
}
],
"default": null
}
]
}
in my java code, i create a UKRecord which has all fields populated correctly, eventually i need to return this object using a json based api, however it complained:
org.apache.avro.AvroRuntimeException: Not a map: {"type":"record","name":"CmRecord","namespace":"com.uhdyi.hi.avro","fields":[{"name":"id","type":"string","default":""},{"name":"name","type":"string","default":""}]}
the java code that write the object to json is :
ObjectWriter writer = ObjectMapper.writer();
if (obj != null) {
response.setHeader(HttpHeaders.Names.CONTENT_TYPE, "application/json; charset=UTF-8");
byte[] bytes = writer.writeValueAsBytes(obj); <-- failed here
...
}
obj is:
{"coupon": "c12345", "cm": [{"id": "1", "name": "name1"}, {"id": "2", "name": "name2"}]}
why do i get this error? please help!
Because you are using unions, Avro is uncertain how to interpret the JSON you are providing. Here's how you can change the JSON so Avro knows it's not null
{
"coupon": { "string": "c12345" },
"cm": { "array": [
{ "id": "1", "name": "name1" },
{ "id": "2", "name": "name2" }
]
}
}
I know, it's really annoying how Avro chose to handle nulls.