I was trying to come up with a schema to validate JSON objects like the following:
{
"id":"some_id",
"properties":{
"A":{
"name":"a",
"isindex":true
},
"B":{
"name":"b"
},
"C":{
"name":"c"
}
}
}
The deal is:
properties A, B, C are not known beforehand and can be any strings.
One and only one of the properties (A, B, C ...) has in its value a "isindex":true key-value pair to indicate the property will be used as a index. That is to say the following is invalid.
.
{
"id":"some_id",
"properties":{
"A":{
"type":"string",
"isindex":true
},
"B":{
"type":"string"
},
"C":{
"type":"array",
"isindex":true
}
}
}
Actually, I am not sure if the JSON schema is the right tool for for this.
Any or all help is appreciated!
JSON Schema is the right tool for this kind of thing, but you have stumbled on a specific case that it doesn't handle. You can assert that at least one matches a particular schema, but you can't assert that one and only one matches.
The best thing you can do is change your data structure to something like this ...
{
"id":"some_id",
"properties":{
"A":{
"name":"a"
},
"B":{
"name":"b"
},
"C":{
"name":"c"
}
},
"index": "A"
}
Related
I am developing a JSON Schema for validating documents like this one:
{
"map": [
{
"key": "mandatoryKey1",
"value": "value1"
},
{
"key": "mandatoryKey2",
"value": "value2"
},
{
"key": "otherStuff",
"value": "value3"
},
{
"key": "someMoreStuff",
"value": "value4"
}
]
}
The document needs to have a "map" array with elements containing keys and values. There MUST be two elements with mandatoryKey1 and mandatoryKey2. Any other key-value pairs are allowed. Order of the elements should not matter. I found this difficult to express in JSON Schema. I can force the schema to check for the mandatory keys like this (left out the definitions part as it is trivial) :
"map": {
"type": "array",
"minItems": 2,
"items": {
"oneOf": [
{
"$ref": "#/definitions/mandatoryElement1"
},
{
"$ref": "#/definitions/mandatoryElement2"
}
]
}
}
The problems are:
It validates that a document includes the mandatory data, but does not permit any other key/value pairs.
It does not check for duplicates, so it can cheated by including mandatoryElement1 twice. Uniqueness of items can only be checked by tuple validation, which I cannot apply here cause the item order should not matter.
The basic problem I see here is that the array elements somehow need to know about each other, i.e. arbitrary key/value pairs are allowed ONLY IF the mandatory keys are present. This "conditional validation" does not seem to be possible with JSON Schema. Any ideas for a better approach?
I have an object definition in common.json file that I need to use in number of other JSON files in terms of reusability. Is there any way to include my common.json file into other JSON files?
Edit:
I came across JSON Pointer while searching which made me thought JSON alone can handle it. To be more clear:
common.json
{
"common":
{
"course":
{
"type": "object",
"properties":
{
"course_name": { "type": "string" },
"course_id": { "type": "integer" },
"course_room": { "type": "integer" }
}
}
}
}
other.json
{
"weekly_schedule":
{
"mathematics": { "$ref": "common.json#/course" },
"history": { "$ref": "common.json#/course" }
}
}
What I understand from here is I can refer to a common JSON object from elsewhere using its path and the $ref keyword. Is that correct or am I missing some point?
JSON is a very simple metaformat. If you take a look at its specification, you will find how simple it is. In particular, it doesn't define any means of aggregation, namespaces, schemata like they are available in XML.
If you want to manipulate JSON or compose different JSON-files, you either treat them as a whole (i.e. as text) and then apply text tools or you decode them, manipulate the received data and then encode the results again.
No, JSON is just text. It doesn't do anything on it's own.
Let's say some of my documents have the following structure:
{
"something":{
"a":"b"
},
"some_other_thing":{
"c":"d"
},
"what_i_want":{
"is_down_here":[
{
"some":{
"not_needed":"object"
},
"another":{
"also_not_needed":"object"
},
"i_look_for":"this_tag",
"tag_properties":{
"this":"that"
}
},
{
"but_not":{
"down":"here"
}
}
]
}
}
Is there a Mango JSON selector that can successfully select on "i_look_for" having the value "this_tag" ? It's inside an array (i know its position in the array). I'm also interested on filtering the result so I only get the "tag_properties" in the result.
I have tried a lot of things, including $elemMatch but everything mostly return "invalid json".
Is that even a use case for Mango or should I stick with views ?
With Cloudant Query (Mango) selector statements, you still need to define an appropriate index before querying. With that in mind, here's your answer:
json-type CQ index
{
"index": {
"fields": [
"what_i_want.is_down_here.0"
]
},
"type": "json"
}
Selector against json-type index
{
"selector": {
"what_i_want.is_down_here.0": {
"i_look_for": "this_tag"
},
"what_i_want.is_down_here.0.tag_properties": {
"$exists": true
}
},
"fields": [
"_id",
"what_i_want.is_down_here.0.tag_properties"
]
}
The solution above assumes that you always know/can guarantee the fields you want are within the 0th element of the is_down_here array.
There is another way to answer this question with a different CQ index type. This article explains the differences, and has helpful examples that show querying arrays. Now that you know a little more about the different index types, here's how you'd answer your question with a Lucene search/"text"-type CQ index:
text-type CQ index
{
"index": {
"fields": [
{"name": "what_i_want.is_down_here.[]", "type": "string"}
]
},
"type": "text"
}
Selector against text-type index
{
"selector": {
"what_i_want.is_down_here": {
"$and": [
{"$elemMatch": {"i_look_for": "this_tag"}},
{"$elemMatch": {"tag_properties": {"$exists": true}}}
]
}
},
"fields": [
"_id",
"what_i_want.is_down_here"
]
}
Read the article and you'll learn that each approach has its tradeoffs: json-type indexes are smaller and less flexible (can only index specific elements); text-type is larger but more flexible (can index all array elements). And from this example, you can also see that the projected values also come with some tradeoffs (projecting specific values vs. the entire array).
More examples in these threads:
Cloudant Selector Query
How to index multidimensional arrays in couchdb
If I'm understanding your question properly, there are two supported ways of doing this according to the docs:
{
"what_i_want": {
"i_look_for": "this_tag"
}
}
should be equivalent to the abbreviated form:
{
"what_i_want.i_look_for": "this_tag"
}
I'm looking for some pointers on mapping a somewhat dynamic structure for consumption by Elasticsearch.
The raw structure itself is json, but the problem is that a portion of the structure contains a variable, rather than the outer elements of the structure being static.
To provide a somewhat redacted example, my json looks like this:
"stat": {
"state": "valid",
"duration": 5,
},
"12345-abc": {
"content_length": 5,
"version": 2
}
"54321-xyz": {
"content_length": 2,
"version", 1
}
The first block is easy; Elasticsearch does a great job of mapping the "stat" portion of the structure, and if I were to dump a lot of that data into an index it would work as expected. The problem is that the next 2 blocks are essentially the same thing, but the raw json is formatted in such a way that a unique element has crept into the structure, and Elasticsearch wants to map that by default, generating a map that looks like this:
"stat": {
"properties": {
"state": {
"type": "string"
},
"duration": {
"type": "double"
}
}
},
"12345-abc": {
"properties": {
"content_length": {
"type": "double"
},
"version": {
"type": "double"
}
}
},
"54321-xyz": {
"properties": {
"content_length": {
"type": "double"
},
"version": {
"type": "double"
}
}
}
I'd like the ability to index all of the "content_length" data, but it's getting separated, and with some of the variable names being used, when I drop the data into Kibana I wind up with really long fieldnames that become next to useless.
Is it possible to provide a generic tag to the structure? Or is this more trivially addressed at the json generation phase, with our developers hard coding a generic structure name and adding an identifier field name.
Any insight / help greatly appreciated.
Thanks!
If those keys like 12345-abc are generated and possibly infinite values, it will get hard (if not impossible) to do some useful queries or aggregations. It's not really clear which exact use case you have for analyzing your data, but you should probably have a look at nested objects (https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html) and generate your input json accordingly to what you want to query for. It seems that you will have better aggregation results if you put these additional objects into an array with a special field containing what is currently your key.
{
"stat": ...,
"things": [
{
"thingkey": "12345-abc",
"content_length": 5,
"version": 2
},
...
]
}
I'm trying to create a JSON schema for an existing JSON file that looks something like this:
{
"variable": {
"name": "age",
"type": "integer"
}
}
In the schema, I want to ensure the type property has the value string or integer:
{
"variable": {
"name": "string",
"type": {
"type": "string",
"enum": ["string", "integer"]
}
}
}
Unfortunately it blows up with message: ValidationError {is not any of [subschema 0]....
I've read that there are "no reserved words" in JSON schema, so I assume a type of type is valid, assuming I declare it correctly?
The accepted answer from jruizaranguren doesn't actually answer the question.
The problem is that given JSON (not JSON schema, JSON data) that has a field named "type", it's hard to write a JSON schema that doesn't choke.
Imagine that you have an existing JSON data feed (data, not schema) that contains:
"ids": [ { "type": "SSN", "value": "123-45-6789" },
{ "type": "pay", "value": "8675309" } ]
What I've found in trying to work through the same problem is that instead of putting
"properties": {
"type": { <======= validation chokes on this
"type": "string"
}
you can put
"patternProperties": {
"^type$": {
"type": "string"
}
but I'm still working through how to mark it as a required field. It may not be possible.
I think, based on looking at the "schema" in the original question, that JSON schemas have evolved quite a lot since then - but this is still a problem. There may be a better solution.
According to the specification, in the Valid typessection for type:
The value of this keyword MUST be either a string or an array. If it is an array, elements of the array MUST be strings and MUST be unique.
String values MUST be one of the seven primitive types defined by the core specification.
Later, in Conditions for successful validation:
An instance matches successfully if its primitive type is one of the types defined by keyword. Recall: "number" includes "integer".
In your case:
{
"variable": {
"name": "string",
"type": ["string", "integer"]
}
}