JSON Schema - matching based on array inclusion - json

I have the following simple JSON schema that does the regular expression match based on the content field of my data:
{
"$schema":"http://json-schema.org/schema#",
"allOf":[
{
"properties":{
"content":{
"pattern":"some_regex"
}
}
}
}
It successfully matches the following data:
{
"content": "some_regex"
}
Now lets say I want to add a list of UUIDs to ignore to my data:
{
"content": "some_regex",
"ignoreIds" ["123", "456"]
}
The problem arises when I want to modify my schema not to match when a given value is present in the list of ignoreIds:
Here is my failed attempt:
{
  "$schema": "http://json-schema.org/schema#",
  "allOf": [{
    "properties": {
      "content": {
        "pattern": "some_regex"
      }
    }
  }, {
    "properties": {
      "ignoreIds": {
        "not": {
          // how do I say 'do not match if "123" is in the ignoreIds array'????
        }
      }
    }
  }]
}
Any help will be appreciated!

your JSON schema for the ignoreIds has to be:
"ignoreIds": {
"type": "array",
"items": {
"type": "integer",
"not": {
"enum": [131, 132, whatever numbers you want]
}
}
}
which says
any value in the array ignoreIds matching the not-enum will make the
json invalid
This works of course for an array of strings also:
"ignoreIds": {
"type": "array",
"items": {
"type": "string",
"not": {
"enum": ["131", "132"]
}
}
}
Tested with JSON Schema Lint

Related

Is there support in JSON Schema for deep object validation?

I was looking around the docs and couldn't find any direct or indirect solution.
Is there any way to get validation on JSON objects without knowing exactly where the specific object is located?
For example, I want to validate the following sub-object:
{
"grandParent": {
"parent": {
"child": {
"name": "John"
}
}
}
}
The object can be part of a larger JSON file the can be structured as follows:
{
"root": {
"someKey": {
"grandParent": ...
},
"grandParent": ...,
...<go in even deeper>: {
"grandParent": ...
}
}
}
Can I create a json schema that validates the object no matter where it is?
Similar example in glob would be: root.**.grandParent.parent.child
You'll need to use a combination of additionalProperties, items, and recursive references.
First, we define the structure you want to validate. You have to define properties for each layer of the object.
Next, you want your root level to reference that definition. Because you're using pre draft 2019-09, you'll need to wrap that reference in an allOf.
Then you want to make sure that for objects, the values have the root schema applied, and for arrays, each item has the root schema applied.
The use of "$ref": "#" resolves to the root of the schema, which creates the cyclical reference.
Some implementations may not like this, but most should be able to handle it.
Here's a live demo of the below schema: https://jsonschema.dev/s/lBrZk
{
"$schema": "http://json-schema.org/draft-07/schema",
"definitions": {
"grandParentToChild": {
"properties": {
"grandParent": {
"properties": {
"parent": {
"properties": {
"child": {
"properties": {
"name": {
"type": "string"
}
}
}
}
}
}
}
}
}
},
"allOf": [
{
"$ref": "#/definitions/grandParentToChild"
}
],
"additionalProperties": {
"$ref": "#"
},
"items": {
"$ref": "#"
}
}

How to check in elasticsearch if a JSON object has a key using the DSL?

If I have two documents within an index of the following format, I just want to weed out the ones which have an empty JSON instead of my expected key.
A
{
"search": {
"gold": [1,2,3,4]
}
B
{
"search":{}
}
I should just get A json and not B json.
I've tried the exists query to search for "gold" but it just checks for non null values and returns the list.
Note: The following doesn't do what I want.
GET test/_search
{
"query": {
"bool": {
"must": [
{
"exists": { "field": "search.gold" }}
]
}
}
}
This is a simple question but I'm unable to find a way to do it even after searching through their docs.
If someone can help me do this it would be really great.
The simplified mapping of the index is :
"test": {
"mappings": {
"carts": {
"dynamic": "true",
"_all": {
"enabled": false
},
"properties": {
"line_items": {
"properties": {
"line_items_dyn_arr": {
"type": "nested",
"properties": {
"dynamic_key": {
"type": "keyword"
}
}
}
}
}
}
}
}
}
Are you storing complete json in search field?
If this is not the case then please share the mapping of your index and sample data.
Update: Query for nested field:
{
"query": {
"nested": {
"path": "search",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "search.gold"
}
}
]
}
}
}
}
}
For nested type fields we need to specify the path and query to be executed on nested fields since nested fields are indexed as child documents.
Elastic documentation: Nested Query
UPDATE based on the mapping added in question asked:
{
"query": {
"nested": {
"path": "line_items.line_items_dyn_arr",
"query": {
"exists": {
"field": "line_items.line_items_dyn_arr"
}
}
}
}
}
Notice that we used "path": "line_items.line_items_dyn_arr". The reason we require to provide full path is because nested field line_items_dyn_arr is itself under line_items object. Had line_items_dyn_arr be a property of mapping and not the property of object or nested field the previous query would work fine.
Nishant's answer is right but for some reason I could get it working only if the path and field are the whole paths.
The following works for me.
{
"nested": {
"path": "search.gold",
"query": {
"exists": {
"field": "search.gold"
}
}
}
}

JSONSchema how to define a schema for a dynamic object

I have a JSON response that I am trying to create a JSONSchema for
{
"gauges": {
"foo": {
"value": 1234
},
"bar": {
"value": 12.44
}
}
}
It is important to know that the objects in the associative array gauges are dynamically generated so there can be zero to many. Each object in gauges will always have a value property and it will always be a number.
So each these are valid
Example 1
{
"gauges": {
"foo": {
"value": 1234
}
}
}
Example 2
{
"gauges": {
"dave": {
"value": 0.44
},
"tommy": {
"value": 12
},
"steve": {
"value": 99999
}
}
}
Example 3
{
"gauges": {}
}
I have looked though the specification and if this was an array I know I could use anyOf but I am unsure how to do this or if it is even possible.
NB I cannot change the format of the JSON
Conceptually what you want is an object representing a Typed Map.
The difficulty is that you have no named property to put your specification in the schema, but in that case You can use "additionalProperties"
{
"type": "object",
"properties": {
"gauges": {
"type": "object",
"additionalProperties": {
"type": "object",
"properties": {
"value": {"type": "number"}
}
}
}
}
}
"gauges" property is defined as an object in which every "additionalProperties" will have a type containing a value of type number.
Note: In java you would serialize it to a Map<String,Value> with Value the classe containing a value. (don't know for other typed language, but I am open to suggestions)
This answer point to an analog solution
How to define JSON Schema for Map<String, Integer>?

How do Elasticsearch's "include_in_parent" / "include_in_root" work? Should it show in '_source'?

In a simple Elasticsearch mapping like this:
{
"personal_document": {
"analyzer": "standard",
"_timestamp": {
"enabled": true
},
"properties": {
"description": {
"type": "multi_field",
"fields": {
"sort": {
"type": "string",
"index": "not_analyzed"
},
"description": {
"type": "string",
"include_in_root": true
}
}
},
"my_nested": {
"type": "nested",
"include_in_root": true,
"properties": {
"description": {
"type": "string"
}
}
}
}
}
}
.... isn't "include_in_root": true supposed to add the field my_nested.description to the root document?
And during a query am I not supposed to see THAT field into the _source field?
and
Specifying an highlight directive on the field 'my_nested.description' would automatically retrieve the _included_in_root value_ instead of the nested field?
(something like this)
"highlight": {
"fields": {
"description": {},
"my_nested.description": {}
}
}
Or do I have some misunderstanding of the official nested type documentation?
(that is not really clear)
If the include_in_parent or include_in_root options are enabled on the nested documents then Elasticsearch internally indexes the data with nested fields flattened on the parent document. However, this is just internal for Elasticsearch and you'll never see them in the _source field.
If the user field is of type object, this document would be indexed
internally something like this...
as it is refered here.
Thus, you continue to perform actions (like the highlights that you mention) by referring to the nested document's fields. The highlight syntax that you refer to should look like this
"highlight": {
"fields": {
"my_nested.description": {}
}
}
and not
"highlight": {
"fields": {
"description": {}
}
}
You can use a wildcard for specifying highlight field:
POST /test-1/page/_search
{
"query": {
"query_string": {
"query": "Lorem ipsum"
}
},
"highlight" : {
"fields" : {
"*" : {}
}
}
}
If it's a good idea, I don't know. I guess it depends on your application.
This also works with nested documents, btw --- but seems to hickup when doing attachments on nested documents without include_in_root

Json Schema Reference Property

In my json schema file i'm trying to force a reference property, $ref onto the user. However, the following does not work
-- sampleSchema.json --
{
"definitions": {
"items": {
"type": "object",
"properties": {
"ref": {
"type": "string"
}
}
}
},
"properties": {
"items" : {
"$ref": "#/definitions/items"
}
}
}
The desired output is this where the user must provide a reference path.
-- whatever.json --
{
"$schema" : "sampleSchema.json?reload",
"items": {
"$ref": "/myEntityReferenceOfChoice"
}
}
In in schema file if i leave the $ in , then it doesn't work. If i take it out for just 'ref' it does. Can i force the user to supply a $ref?
Im using Visual Studio 2013..