I have an object definition in common.json file that I need to use in number of other JSON files in terms of reusability. Is there any way to include my common.json file into other JSON files?
Edit:
I came across JSON Pointer while searching which made me thought JSON alone can handle it. To be more clear:
common.json
{
"common":
{
"course":
{
"type": "object",
"properties":
{
"course_name": { "type": "string" },
"course_id": { "type": "integer" },
"course_room": { "type": "integer" }
}
}
}
}
other.json
{
"weekly_schedule":
{
"mathematics": { "$ref": "common.json#/course" },
"history": { "$ref": "common.json#/course" }
}
}
What I understand from here is I can refer to a common JSON object from elsewhere using its path and the $ref keyword. Is that correct or am I missing some point?
JSON is a very simple metaformat. If you take a look at its specification, you will find how simple it is. In particular, it doesn't define any means of aggregation, namespaces, schemata like they are available in XML.
If you want to manipulate JSON or compose different JSON-files, you either treat them as a whole (i.e. as text) and then apply text tools or you decode them, manipulate the received data and then encode the results again.
No, JSON is just text. It doesn't do anything on it's own.
Related
Wondering if I can create a "dynamic mapping" within an elasticsearch index. The problem I am trying to solve is the following: I have a schema that has an attribute that contains an object that can differ greatly between records. I would like to mirror this data within elasticsearch if possible but believe that automatic mapping may get in the way.
Imagine a scenario where I have a schema like the following:
{
name: string
origin: string
payload: object // can be of any type / schema
}
Is it possible to create a mapping that supports this? I do not need to query the records by this payload attribute, but it would be great if I can.
Note that I have checked the documentation but am confused on if what elastic calls dynamic mapping is what I am looking for.
It's certainly possible to specify which queryable fields you expect the payload to contain and what those fields' mappings should be.
Let's say each doc will include the fields payload.livemode and payload.created_at. If these are the only two fields you'll want to perform queries on, and you'd like to disable dynamic, index-time mappings autogenerated by Elasticsearch for the rest of the fields, you can use dynamic templates like so:
PUT my-payload-index
{
"mappings": {
"dynamic_templates": [
{
"variable_payload": {
"path_match": "payload",
"mapping": {
"type": "object",
"dynamic": false,
"properties": {
"created_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"livemode": {
"type": "boolean"
}
}
}
}
}
],
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"origin": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Then, as you ingest your docs:
POST my-payload-index/_doc
{
"name": "abc",
"origin": "web.dev",
"payload": {
"created_at": "2021-04-05 08:00:00",
"livemode": false,
"abc":"def"
}
}
POST my-payload-index/_doc
{
"name": "abc",
"origin": "web.dev",
"payload": {
"created_at": "2021-04-05 08:00:00",
"livemode": true,
"modified_at": "2021-04-05 09:00:00"
}
}
and verify with
GET my-payload-index/_mapping
no new mappings will be generated for the fields payload.abc nor payload.modified_at.
Not only that — the new fields will also be ignored, as per the documentation:
These fields will not be indexed or searchable, but will still appear in the _source field of returned hits.
Side note: if fields are neither stored nor searchable, they're effectively the opposite of enabled.
The Big Picture
Working with variable contents of a single, top-level object is quite standard. Take for instance the stripe event object — each event has an id, an api_version and a few other shared params. Then there's the data object that's analogous to your payload field.
Now, all is fine, until you need to aggregate on the contents of your payload. See, since the content is variable, so are the data paths / accessors. But wildcards in aggregation paths don't work in Elasticsearch. Scripts do but are onerous to maintain.
Back to stripe. They partially solved it through what they call polymorphic, typed hashes — as discussed in their blog on API design:
A pretty neat approach that's worth emulating.
P.S. I discuss dynamic templates in more detail in the chapter "Mapping Automation" of my ES Handbook.
I want to describe this JSON:
{
"key1": {},
"key2": {}
}
so I created this JSON schema:
{
"type": "object",
"patternProperties": {
".+": {
"type": "object"
}
}
}
The problem is that when I add a $schema link to the JSON, it's not valid:
First, it seems strange that $schema would need any kind of special treatment but even if I try this:
{
"type": "object",
"properties": {
"$schema": {
"type": "string"
}
},
"patternProperties": {
".+": {
"type": "object"
}
}
}
it's not fixed:
I'm browsing a couple of schemas at http://schemastore.org/json/ and they don't seem to have any special treatment of $schema. How does it work?
The $schema keyword is used to declare that a JSON fragment is actually a piece of JSON Schema.
But it's not used in your JSON when it's not a schema, i.e. it is not used in your JSON data.
You then use a validator to match the schema against the JSON data. For exmple you can use this validator. In the left you specify the schema in the right you specify JSON data (without any reference or link to the schema, you don't use the $schema keyword in the right side)
The $schema keyword specifies which version of the JSON Schema standard the schema applies to (again the JSON schema, not the JSON data). Most of the time is:
"$schema": "http://json-schema.org/draft-04/schema#"
More info here
The accepted answer is correct, but here is the workaround you need.
{
"type": "object",
"properties": {
"$schema": {
"type": "string"
}
},
"additionalProperties": {
"type": "object"
}
}
additionalProperites applies only to properties that are defined in properties. patternProperties, on the other hand, applies to any property that it matches. The way you wrote it with patternProperties means that "$schema" must be a string and it must be an object. Since those two things can never both be true, "$schema" will never validate with any value.
One have the following JSON object:
{
"index": 10,
"data": "<?xml version=\"1.0\"?>..."
}
the corresponding schema:
{
"title": "Example",
"type": "object",
"properties": {
"index": {
"type": "integer"
},
"data": {
"type": "string"
}
}
}
What I'm trying to achieve is to validate XML inside data property with XSD schema.
How to represent XML data type with xsd schema attribute correctly from the point of JSON Schema specs?
Short answer
You can't
Long answer
You really can't. No JSON processor in the history of mankind will be able to validate inline XML against an XSD.
The only thing you can do is include the XSD file as text and then the consumer of the JSON can then do the validation on their side. Or, even better, validate the XML before you place it in the JSON doc.
I'm looking for some pointers on mapping a somewhat dynamic structure for consumption by Elasticsearch.
The raw structure itself is json, but the problem is that a portion of the structure contains a variable, rather than the outer elements of the structure being static.
To provide a somewhat redacted example, my json looks like this:
"stat": {
"state": "valid",
"duration": 5,
},
"12345-abc": {
"content_length": 5,
"version": 2
}
"54321-xyz": {
"content_length": 2,
"version", 1
}
The first block is easy; Elasticsearch does a great job of mapping the "stat" portion of the structure, and if I were to dump a lot of that data into an index it would work as expected. The problem is that the next 2 blocks are essentially the same thing, but the raw json is formatted in such a way that a unique element has crept into the structure, and Elasticsearch wants to map that by default, generating a map that looks like this:
"stat": {
"properties": {
"state": {
"type": "string"
},
"duration": {
"type": "double"
}
}
},
"12345-abc": {
"properties": {
"content_length": {
"type": "double"
},
"version": {
"type": "double"
}
}
},
"54321-xyz": {
"properties": {
"content_length": {
"type": "double"
},
"version": {
"type": "double"
}
}
}
I'd like the ability to index all of the "content_length" data, but it's getting separated, and with some of the variable names being used, when I drop the data into Kibana I wind up with really long fieldnames that become next to useless.
Is it possible to provide a generic tag to the structure? Or is this more trivially addressed at the json generation phase, with our developers hard coding a generic structure name and adding an identifier field name.
Any insight / help greatly appreciated.
Thanks!
If those keys like 12345-abc are generated and possibly infinite values, it will get hard (if not impossible) to do some useful queries or aggregations. It's not really clear which exact use case you have for analyzing your data, but you should probably have a look at nested objects (https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html) and generate your input json accordingly to what you want to query for. It seems that you will have better aggregation results if you put these additional objects into an array with a special field containing what is currently your key.
{
"stat": ...,
"things": [
{
"thingkey": "12345-abc",
"content_length": 5,
"version": 2
},
...
]
}
I'm trying to create a JSON schema for an existing JSON file that looks something like this:
{
"variable": {
"name": "age",
"type": "integer"
}
}
In the schema, I want to ensure the type property has the value string or integer:
{
"variable": {
"name": "string",
"type": {
"type": "string",
"enum": ["string", "integer"]
}
}
}
Unfortunately it blows up with message: ValidationError {is not any of [subschema 0]....
I've read that there are "no reserved words" in JSON schema, so I assume a type of type is valid, assuming I declare it correctly?
The accepted answer from jruizaranguren doesn't actually answer the question.
The problem is that given JSON (not JSON schema, JSON data) that has a field named "type", it's hard to write a JSON schema that doesn't choke.
Imagine that you have an existing JSON data feed (data, not schema) that contains:
"ids": [ { "type": "SSN", "value": "123-45-6789" },
{ "type": "pay", "value": "8675309" } ]
What I've found in trying to work through the same problem is that instead of putting
"properties": {
"type": { <======= validation chokes on this
"type": "string"
}
you can put
"patternProperties": {
"^type$": {
"type": "string"
}
but I'm still working through how to mark it as a required field. It may not be possible.
I think, based on looking at the "schema" in the original question, that JSON schemas have evolved quite a lot since then - but this is still a problem. There may be a better solution.
According to the specification, in the Valid typessection for type:
The value of this keyword MUST be either a string or an array. If it is an array, elements of the array MUST be strings and MUST be unique.
String values MUST be one of the seven primitive types defined by the core specification.
Later, in Conditions for successful validation:
An instance matches successfully if its primitive type is one of the types defined by keyword. Recall: "number" includes "integer".
In your case:
{
"variable": {
"name": "string",
"type": ["string", "integer"]
}
}