Json schema for recursive key - json

json data as given and have the names of the students in multiple instance like 100 (only 3 given). So, is there a way to give a #defs for a key and value to simplify the schema?
{
"student_id": {
"Alice": 0,
"Bob": 1,
"Charlie": 2,
"Derek": 3,
"Emily": 4,
"Florence": 5
},
"project": {
"Alice": "Science",
"Bob": "Math",
"Charlie": "Science",
"Derek": "Science",
"Emily": "Math",
"Florence": "Math"
},
"summer_camp": {
"Alice": true,
"Bob": false,
"Charlie": true,
"Derek": false,
"Emily": true,
"Florence": false
},
"Data":[
"student_id",
"project",
"summer_camp"
]
}

You can specify the property names in a reusable definition:
{
"$defs": {
"property_names_students": {
"propertyNames": {
"enum": [
"Alice",
"Bob",
...
]
]
}
},
"type": "object",
"properties": {
"student_id": {
"$ref": "#/$defs/property_names_students",
"additionalProperties": {
"type": "integer"
}
},
"project": {
"$ref": "#/$defs/property_names_students",
"additionalProperties": {
"enum": ["Science", "Math", ... ]
}
},
...
}
}

Related

JSON Schema with Nested Objects with different properties

The entire JSON file is rather large so I've only taken out the subsection I've had an issue with.
{
"diagrams": {
"5f759d15cd046720c28531dd": {
"_id": "5f759d15cd046720c28531dd",
"offsetX": 320,
"offsetY": 42,
"zoom": 80,
"modified": 1604279356,
"nodes": {
"5f9f5c3ccd046720c28531e4": {
"nodeID": "5f9f5c3ccd046720c28531e4",
"type": "start",
"coords": [
360,
120
],
"data": {
"name": "Start",
"color": "standard",
"ports": [
{
"type": "",
"target": "5f9f5c3ccd046720c28531e6"
}
],
"steps": []
}
},
"5f9f5c3ccd046720c28531e5": {
"nodeID": "5f9f5c3ccd046720c28531e5",
"type": "block",
"coords": [
760,
120
],
"data": {
"name": "Help Message",
"color": "standard",
"steps": [
"5f9f5c3ccd046720c28531e6",
"5f9f5c3ccd046720c28531e7"
]
}
},
"5f9f5c3ccd046720c28531e6": {
"nodeID": "5f9f5c3ccd046720c28531e6",
"type": "speak",
"data": {
"randomize": false,
"dialogs": [
{
"voice": "Alexa",
"content": "You said help. Do you want to continue?"
}
],
"ports": [
{
"type": "",
"target": "5f9f5c3ccd046720c28531e7"
}
]
}
},
"5f9f5c3ccd046720c28531e7": {
"nodeID": "5f9f5c3ccd046720c28531e7",
"type": "interaction",
"data": {
"name": "Choice",
"else": {
"type": "path",
"randomize": false,
"reprompts": []
},
"choices": [
{
"intent": "",
"mappings": []
},
{
"intent": "",
"mappings": []
}
],
"reprompt": null,
"ports": [
{
"type": "else",
"target": null
},
{
"type": "",
"target": null
},
{
"type": "",
"target": "5f9f5c3ccd046720c28531e9"
}
]
}
},
"5f9f5c3ccd046720c28531e8": {
"nodeID": "5f9f5c3ccd046720c28531e8",
"type": "block",
"coords": [
1170,
260
],
"data": {
"name": "Exit",
"color": "standard",
"steps": [
"5f9f5c3ccd046720c28531e9"
]
}
},
"5f9f5c3ccd046720c28531e9": {
"nodeID": "5f9f5c3ccd046720c28531e9",
"type": "exit",
"data": {
"ports": []
}
}
},
"children": [],
"creatorID": 42661,
"variables": [],
"name": "Help Flow",
"versionID": "5f759d15cd046720c28531db"
}
}
}
The Current JSON Schema Definition I have is:
{
"$schema":"http://json-schema.org/schema#",
"type":"object",
"properties":{
"diagrams":{
"type":"object"
}
},
"required":[
"diagrams",
]
}
The problem I am having is that within diagrams contains multiple objects with a random string as the name e.g "5f759d15cd046720c28531dd".
Then within that object there are properties such as (_id, offsetX) which I want to express as well as a nodes object, which again contains multiple objects with arbitrary names e.g ("5f9f5c3ccd046720c28531e4", "5f9f5c3ccd046720c28531e5", ...) which have a unique node definition where some nodes have different properties to other nodes (nodeID, type, data vs nodeID, type, data, coords).
My question is with all these arbitrary things such as random names as well as different properties per each node. How do I turn it into 1 JSON schema definition which covers all the cases of how a diagram/node can be made.
You can do this with additionalProperties or patternProperties.
additionalProperties applies to any property that isn't declared in properties or patternProperties.
{
"type": "object",
"additionalProperties": {
"type": "object",
"properties": {
"_id": { ... },
"offsetX": { ... },
...
}
}
}
Your property names appear to always be hex numbers. If you want to enforce that those property names are always hex numbers, you can use patternProperties. Any property that matches the regex must conform to that schema.
{
"type": "object",
"patternProperties": {
"^[0-9a-f]{24}$": {
"type": "object",
"properties": {
"_id": { ... },
"offsetX": { ... },
...
}
}
},
"additionalProperties": false
}

Import JSON with objects as nested to Elastic Search

i've log with thousands records of aggregated data in JSON:
{
"count": 25,
"domain": "domain.tld",
"geoips": {
"AU": 5,
"NZ": 20
},
"ips": {
"1.2.3.4": 5,
"1.2.3.5": 1,
"1.2.3.6": 1,
"1.2.3.7": 1,
"1.2.3.8": 1,
"1.2.3.9": 9,
"1.2.3.10": 7
},
"subdomains": {
"a.domain.tld": 1,
"b.domain.tld": 1,
"c.domain.tld": 1,
"domain.tld": 22
},
"tld": "tld",
"types": {
"1": 3,
"43": 22
}
}
and i have mapping on ES:
"mappings": {
"properties": {
"count": {
"type": "long"
},
"domain": {
"type": "keyword"
},
"ips": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"val": {
"type": "long"
}
}
},
"geoips": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"val": {
"type": "long"
}
}
},
"subdomains": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"val": {
"type": "long"
}
}
},
"tld": {
"type": "keyword"
},
"types": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"val": {
"type": "long"
}
}
}
}
}
Is there any simple way how import these lines to ES as nested objects ? If i use a bulk insert without modification, the ES will modify mapping by adding a new field for each IP/subdomain/GeoIP instead add it as simple key/val object.
Or only one way is regenerate JSON to key/val nested fields ?
Your mapping is already very good but the data doesn't fit it since the nested data type expects an array of objects, not a single object. So you'll need to transform your nested objects into array of key-value pairs like so:
...
"ips": [
{
"key": "1.2.3.4",
"val": 5
},
{
"key": "1.2.3.5",
"val": 1
},
...
],
"subdomains": [
{
"key": "a.domain.tld",
"val": 1
},
{
"key": "b.domain.tld",
"val": 1
},
...
]
...

Parse JSON with map of list

I am new to scala and JSON parsing and need some help. I need to parse the complex JSON (below) to get the values of "name" in "dimension" key i.e I need PLATFORM and OS_VERSION.
I tried multiple options, but it is not working. Any help is appreciated
This is a snippet of the code I tried, but I am not able to proceed further in parsing the list. I believe the 'ANY' keyword is causing some mismatch / issues.
import org.json4s._
import org.json4s.jackson.JsonMethods._
implicit val formats = org.json4s.DefaultFormats
val mapJSON = parse(tmp).extract[Map[String, Any]]
println(mapJSON)
//for ((k,v) <- mapJSON) printf("key: %s, value: %s\n", k, v)
val list_map = mapJSON("dimensions")
{
"uuid": "uuidddd",
"last_modified": 1559080222953,
"version": "2.6.1.0",
"name": "FULL_DAY_2_mand_date",
"is_draft": false,
"model_name": "FULL_DAY_1_may05",
"description": "",
"null_string": null,
"dimensions": [
{
"name": "PLATFORM",
"table": "tbl1",
"column": "PLATFORM",
"derived": null
},
{
"name": "OS_VERSION",
"table": "tbl1",
"column": "OS_VERSION",
"derived": null
},
],
"measures": [
{
"name": "_COUNT_",
"function": {
"expression": "COUNT",
"parameter": {
"type": "constant",
"value": "1"
},
"returntype": "bigint"
}
},
{
"name": "UU",
"function": {
"expression": "COUNT_DISTINCT",
"parameter": {
"type": "column",
"value": "tbl1.USER_ID"
},
"returntype": "hllc(12)"
}
},
{
"name": "CONT_SIZE",
"function": {
"expression": "SUM",
"parameter": {
"type": "column",
"value": "tbl1.SIZE"
},
"returntype": "bigint"
}
},
{
"name": "CONT_COUNT",
"function": {
"expression": "SUM",
"parameter": {
"type": "column",
"value": "tbl1.COUNT"
},
"returntype": "bigint"
}
}
],
"dictionaries": [],
"rowkey": {
"rowkey_columns": [
{
"column": "tbl1.OS_VERSION",
"encoding": "dict",
"encoding_version": 1,
"isShardBy": false
},
{
"column": "tbl1.PLATFORM",
"encoding": "dict",
"encoding_version": 1,
"isShardBy": false
},
{
"column": "tbl1.DEVICE_FAMILY",
"encoding": "dict",
"encoding_version": 1,
"isShardBy": false
}
]
},
"hbase_mapping": {
"column_family": [
{
"name": "F1",
"columns": [
{
"qualifier": "M",
"measure_refs": [
"_COUNT_",
"CONT_SIZE",
"CONT_COUNT"
]
}
]
},
{
"name": "F2",
"columns": [
{
"qualifier": "M",
"measure_refs": [
"UU"
]
}
]
}
]
},
"aggregation_groups": [
{
"includes": [
"tbl1.PLATFORM",
"tbl1.OS_VERSION"
],
"select_rule": {
"hierarchy_dims": [],
"mandatory_dims": [
"tbl1.DATE_HR"
],
"joint_dims": []
}
}
],
"signature": "ttrrs==",
"notify_list": [],
"status_need_notify": [
"ERROR",
"DISCARDED",
"SUCCEED"
],
"partition_date_start": 0,
"partition_date_end": 3153600000000,
"auto_merge_time_ranges": [
604800000,
2419200000
],
"volatile_range": 0,
"retention_range": 0,
"engine_type": 4,
"storage_type": 2,
"override_kylin_properties": {
"job.queuename": "root.production.P0",
"is-mandatory-only-valid": "true"
},
"cuboid_black_list": [],
"parent_forward": 3,
"mandatory_dimension_set_list": [],
"snapshot_table_desc_list": []
}
You need to make more specific classes for parsing the data, something like this:
case class Dimension(name: String, table: String, column: String)
case class AllData(uuid: String, dimensions: List[Dimension])
val data = parse(tmp).extract[AllData]
val names = data.dimensions.map(_.name)

How can I explictly constrain multiple items in a JSON Schema array?

I am creating a JSON schema and want to define an array containing only exact matches for certain items:
An example of the sort of JSON (snippet) would look like:
{
"results":
[
{ "id": 1, "test": true, "volts": 700, "duration": 100 },
{ "id": 2, "test": false }
]
}
This seems to be a combination of OneOf and "additionalProperties": false but I can't work out how that should be used. So far I have:
{
"results":
{
"type": "array",
"items":
{
"type": "object",
"OneOf":
[
{
"id": { "type": "integer" },
"test": { "type": "boolean" },
"volts": { "type": "integer" },
"duration": { "type": "integer" }
},
{
"id": { "type": "integer" },
"test": { "type": "boolean" }
}
],
"additionalProperties": false
}
}
}
I'm using http://www.jsonschemavalidator.net/ to check my JSON.
But when I validate the following JSON against my schema it says it's valid; is the website incorrect or have I done something wrong?
{
"results": [
{
"fred": 7,
"id": 7,
"test": true,
"volts": 7,
"duration": 7
},
{
"fish": 7
}
]
}

How to query nested structure in elasticsearch

Below are two mocked records from my elasticsearch index. I have millions of records in my ES. I am trying to query ES to get all the records that have non-empty/ non-null "tags" field. If a record doesn't have a tag ( like second record below) then I don't want to pull it from ES.
If "books" were not nested then googling around seems like the below query would have worked -
curl -XGET 'host:port/book_indx/book/_search?' -d '{
"query" : {"filtered" : {"filter" : {"exists" :{"field" : "_source"}}}}
}'
However I am not finding a solution to query the nested structure. I tried the below with no luck -
{"query" : {"filtered" : {"filter" : {"exists" :{"field" : "_source.tags"}}}}}
{"query" : {"filtered" : {"filter" : {"exists" :{"field" : "_source":{"tags"}}}}}}
Any suggestions are really appreciated here! Thanks in advance.
{
"_shards": {
"failed": 0,
"successful": 12,
"total": 12
},
"hits": {
"hits": [
{
"_id": "book1",
"_index": "book",
"_source": {
"book_name": "How to Get Organized",
"publication_date": "2014-02-24T16:50:39+0000",
"tags": [
{
"category": "self help",
"topics": [
{
"name": "time management",
"page": 6198
},
{
"name": "calendar",
"page": 10
}
],
"id": "WEONWOIR234LI",
}
],
"last_updated": "2015-11-11T16:28:32.308+0000"
},
"_type": "book"
},
{
"_id": "book2",
"_index": "book",
"_source": {
"book_name": "How to Cook",
"publication_date": "2014-02-24T16:50:39+0000",
"tags": [],
"last_updated": "2015-11-11T16:28:32.308+0000"
},
"_type": "book"
}
],
"total": 1
},
"timed_out": false,
"took": 80
}
Mapping -
"book": {
"_id": {
"path": "message_id"
},
"properties": {
"book_name": {
"index": "not_analyzed",
"type": "string"
},
"publication_date": {
"format": "date_time||date_time_no_millis",
"type": "date"
},
"tags": {
"properties": {
"category": {
"index": "not_analyzed",
"type": "string"
},
"topic": {
"properties": {
"name": {
"index": "not_analyzed",
"type": "string"
},
"page": {
"index": "no",
"type": "integer"
}
}
},
"id": {
"index": "not_analyzed",
"type": "string"
}
},
"type": "nested"
},
"last_updated": {
"format": "date_time||date_time_no_millis",
"type": "date"
}
}
}
Since your tags field has a nested type, you need to use a nested filter in order to query it.
The following filtered query will correctly return only the first document above (i.e. with id book1)
{
"query": {
"filtered": {
"filter": {
"nested": {
"path": "tags",
"filter": {
"exists": {
"field": "tags"
}
}
}
}
}
}
}