Elastic Search mapping bad mapping - json

I have the following mapping for a Elastic Search index. I am posting ("PUT") it to http://abc.com/test/article/_mapping.
{
"article": {
"settings": {
"analysis": {
"analyzer": {
"stem": {
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"stop",
"porter_stem"
]
}
}
}
},
"mappings": {
"properties": {
"DocumentID": {
"type": "string"
},
"ContentSource": {
"type": "integer"
},
"ContentType": {
"type": "integer"
},
"PageTitle": {
"type": "string",
"analyzer": "stem"
},
"ContentBody": {
"type": "string",
"analyzer": "stem"
},
"URL": {
"type": "string"
}
}
}
}
}
I get an OK message from Elastic Search. But when I go to http://abc.com/test/article/_mapping , I don't see the settings of the mapping. All I see is this
{ "article" : { "properties" : { } }}
I had this working before I added the settings portion for the analyzer. Any help is appreciated!

I figured it out. The first "article" string needs to be deleted. And the "PUT" should happen against http://abc.com/index

Related

django-elasticsearch-dsl-drf with JSONfield

I'm using django-elasticsearch-dsl-drf package and I have Postgres jsonField which I want to index. I tried to use Nestedfield in the document but without any properties since the json field is arbitrary, But I'm not able to search on that field, and I don't see anything related to that on their documentation.
Any idea how can I achieve this?
Mapping:
{
"mappings": {
"_doc": {
"properties": {
"jsondata": {
"type": "nested",
"properties": {
"timestamp": {
"type": "date"
},
"gender": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"group_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}, ...
I want to search on that field like jsondata.gender = x
Query for jsonfield.gender = x
GET <index>/_search
{
"query": {
"nested": {
"path": "jsonfield",
"query": {
"term": {
"jsonfield.gender.keyword": {
"value": "x"
}
}
}
}
}
}
NOTE: The query has not been verified using Kibana Dev Tools.

What is wrong with this elastic json query, mapping?

I am trying to use nested JSON to query DB records. Here is my query -
"query": {
"nested": {
"path": "metadata.technical",
"query": {
"bool": {
"must": [
{
"term": {
"metadata.technical.key": "techcolor"
}
},
{
"term": {
"metadata.technical.value": "red"
}
}
]
}
}
}
}
Here is this part in my mapping.json -
"metadata": {
"include_in_parent": true,
"properties": {
"technical": {
"type": "nested",
"properties": {
"key": {
"type": "string"
},
"value": {
"type": "string"
}
}
}
}
}
And I have table that has column 'value' and its content is -
{"technical":
{
"techname22": "test",
"techcolor":"red",
"techlocation": "usa"
}
}
Why I can't get any results with this? FYI I am using ES 1.7. Thanks for any help.
To respect the mapping you've defined your sample document should look like this:
{
"technical": [
{
"key": "techname22",
"value": "test"
},
{
"key": "techcolor",
"value": "red"
},
{
"key": "techlocation",
"value": "usa"
}
]
}
Changing your document with the above structure would make your query work as it is.
The real mapping of this document:
{
"technical": {
"techname22": "test",
"techcolor": "red",
"techlocation": "usa"
}
}
Is more like this:
{
"include_in_parent": true,
"properties": {
"technical": {
"type": "nested",
"properties": {
"techname22": {
"type": "string"
},
"techcolor": {
"type": "string"
},
"techlocation": {
"type": "string"
}
}
}
}
}
If all your keys are dynamic and not known in advance, you can also configure your mapping to be dynamic as well, i.e. don't define any fields in the nested type and new fields will be added if not already present in the mapping:
{
"include_in_parent": true,
"properties": {
"technical": {
"type": "nested",
"properties": {
}
}
}
}

Validating JSON data based on the value of another field

One of the fields in my data is as follows:
{
"license":
"url": "<some url>",
"label": "<some label>"
}
I would like to validate that, for example, if the user provides either of the values for "url":
["http://creativecommons.org/licenses/by/4.0/",
"https://creativecommons.org/licenses/by/4.0/"]
That the value of label must be one of:
["CC-BY", "CC BY 4.0", "CC-BY 4.0"]
And there are multiple different label options, supporting HTTP or HTTPS. I tried the following, but the validation failed (couldn't validate), and I couldn't find anything about corresponding values, just if one field exists, then another must exist (dependencies).
"license": {
"type": "object",
"properties": {
"oneOf": [
{
"url": { "enum": ["http://creativecommons.org/licenses/by/4.0/", "https://creativecommons.org/licenses/by/4.0/"] },
"label": { "enum": ["CC-BY", "CC BY 4.0", "CC-BY 4.0"] }
},
{
"url": { "enum": ["http://creativecommons.org/publicdomain/zero/1.0/", "https://creativecommons.org/publicdomain/zero/1.0/"] },
"label": { "enum": ["CC-0", "CC0", "CC0 1.0 Universal", "CC0 1.0"] }
},
{
"url": { "enum": ["http://creativecommons.org/licenses/by/3.0/", "https://creativecommons.org/licenses/by/3.0/"] },
"label": { "enum": ["CC-BY", "CC-BY 3.0"] }
}
... <and so on>
I tried a couple of different iterations of dependencies/properties in oneOf and nothing seemed to work.
I found the method that works - "oneOf" should be level with "properties" and encase the "properties" objects within it, as below.
"license": {
"type": "object",
"properties": {
"url": { "type": "string", "format": "uri" },
"label": { "type": "string" },
"logo": { "type": "string", "format": "uri" }
},
"required": ["url", "label"],
"oneOf": [
{
"properties": {
"url": { "enum": ["http://creativecommons.org/licenses/by/4.0/", "https://creativecommons.org/licenses/by/4.0/"] },
"label": { "enum": ["CC-BY", "CC BY 4.0", "CC-BY 4.0"] }
}
},
{
"properties": {
"url": { "enum": ["http://creativecommons.org/publicdomain/zero/1.0/", "https://creativecommons.org/publicdomain/zero/1.0/"] },
"label": { "enum": ["CC-0", "CC0", "CC0 1.0 Universal", "CC0 1.0"] }
}
},
{
"properties": {
"url": { "enum": ["http://creativecommons.org/licenses/by/3.0/", "https://creativecommons.org/licenses/by/3.0/"] },
"label": { "enum": ["CC-BY", "CC-BY 3.0"] }
}
},
... <and so on>

JSON schema: Why does "constant" not validate the same way as a single-valued "enum"?

I have an object that provides a sort of audit log of versions of an asset. A couple of its properties (versionSource.metadata and versionSource.files) are objects that should validate against one of two schemas, depending on the value of one of their properties. I started off using a constant in my sub-schemas (inside the oneOf, but that was saying that all the the sub-schemas validated (thus breaking the oneOf since more than one validated. Changing it to a single-valued enum worked, though.
Why the difference in validation?
Here's the original schema:
{
"$id": "https://example.com/schemas/asset-version.json",
"title": "Audit log of asset versions",
"$schema": "http://json-schema.org/draft-07/schema",
"type": "object",
"required": [
"assetID",
"version",
"versionSource"
],
"properties": {
"assetID": {
"type": "string"
},
"version": {
"type": "integer",
"minimum": 1
},
"versionSource": {
"type": "object",
"properties": {
"metadata": {
"type": "object",
"oneOf": [
{
"properties": {
"sourceType": { "constant": "client" }
}
},
{
"$ref": "#/definitions/version-source-previous-version"
}
]
},
"files": {
"type": "object",
"oneOf": [
{
"properties": {
"sourceType": { "constant": "upload" },
"sourceID": {
"type": "string"
}
}
},
{
"$ref": "#/definitions/version-source-previous-version"
}
]
}
}
}
},
"definitions": {
"version-source-previous-version": {
"properties": {
"sourceType": { "constant": "previous-version" },
"sourceID": {
"type": "integer",
"minimum": 1
}
}
}
}
}
Here's one example document:
{
"assetID": "0150a186-068d-43e7-bb8b-0a389b572379",
"version": 1,
"versionSource": {
"metadata": {
"sourceType": "client"
},
"files": {
"sourceType": "upload",
"sourceID": "54ae67b0-3e42-464a-a93f-3143b0f078fc"
}
},
"created": "2018-09-01T00:00:00.00Z",
"lastModified": "2018-09-02T12:10:00.00Z",
"deleted": "2018-09-02T12:10:00.00Z"
}
And one more:
{
"assetID": "0150a186-068d-43e7-bb8b-0a389b572379",
"version": 2,
"versionSource": {
"metadata": {
"sourceType": "previous-version",
"sourceID": 1
},
"files": {
"sourceType": "previous-version",
"sourceID": 1
}
},
"created": "2018-09-01T00:00:00.00Z",
"lastModified": "2018-09-02T12:10:00.00Z",
"deleted": "2018-09-02T12:10:00.00Z"
}
Here's the error I get:
Message: JSON is valid against more than one schema from 'oneOf'. Valid schema indexes: 0, 1.
Schema path:
https://example.com/schemas/asset-version.json#/properties/versionSource/properties/metadata/oneOf
Since sourceType is a constant in both schemas inside the oneOf, I'm really not sure how my object could possibly be valid against both schemas.
Changing the schema to the following, though, worked:
{
"$id": "https://example.com/schemas/asset-version.json",
"title": "Audit log of asset versions",
"$schema": "http://json-schema.org/draft-07/schema",
"type": "object",
"required": [
"assetID",
"version",
"versionSource"
],
"properties": {
"assetID": {
"type": "string"
},
"version": {
"type": "integer",
"minimum": 1
},
"versionSource": {
"type": "object",
"properties": {
"metadata": {
"type": "object",
"oneOf": [
{
"properties": {
"sourceType": { "enum": [ "client" ] }
}
},
{
"$ref": "#/definitions/version-source-previous-version"
}
]
},
"files": {
"type": "object",
"oneOf": [
{
"properties": {
"sourceType": { "enum": [ "upload" ] },
"sourceID": {
"type": "string"
}
}
},
{
"$ref": "#/definitions/version-source-previous-version"
}
]
}
}
}
},
"definitions": {
"version-source-previous-version": {
"properties": {
"sourceType": { "enum": [ "previous-version" ] },
"sourceID": {
"type": "integer",
"minimum": 1
}
}
}
}
}
What am I missing?
It was my own typo ... constant should have been const. :facepalm:
according to draft 7
It should be noted that const is merely syntactic sugar for an enum with a single element, therefore the following are equivalent:
{ "const": "United States of America" }
{ "enum": [ "United States of America" ] }
some might find useful providing the default key when used in some render form solutions to have that single choice picked.
Hmm.. nothing is jumping out at me as incorrect. Since you're using draft-07 you could try writing it with if/then/else and see if the error is more helpful.
But...
Are you certain that the implementation you are using understands draft-07? If it ignored the $schema and ran it through draft-04 rules, it would not understand const. You should to check your tool documentation for this.

configure an elasticsearch index with json not taking

I'm using the following json to configure elasticsearch. The goal is to set up the index and the type in one swoop (this is the requirement, setting up docker images). This is as far as I've gotten that will allow elasticsearch to start successfully. The problem is that the index isn't created yet it doesn't error. Other forms I've tried prevents the service from starting.
{
"cluster": {
"name": "MyClusterName"
},
"node": {
"name": "MyNodeName"
},
"indices": {
"number_of_shards": 4,
"index.number_of_replicas": 4
},
"index": {
"analysis": {
"analyzer": {
"my_ngram_analyzer": {
"tokenizer": "my_ngram_tokenizer",
"filter": "lowercase"
},
"my_lowercase_whitespace_analyzer": {
"tokenizer": "whitespace",
"filter": "lowercase"
}
},
"tokenizer": {
"my_ngram_tokenizer": {
"type": "nGram",
"min_gram": "2",
"max_gram": "20"
}
}
},
"index": {
"settings": {
"_id": "indexindexer"
},
"mappings": {
"inventoryIndex": {
"_id": {
"path": "indexName"
},
"_routing": {
"required": true,
"path": "indexName"
},
"properties": {
"indexName": {
"type": "string",
"index": "not_analyzed"
},
"startedOn": {
"type": "date",
"index": "not_analyzed"
},
"deleted": {
"type": "boolean",
"index": "not_analyzed"
},
"deletedOn": {
"type": "date",
"index": "not_analyzed"
},
"archived": {
"type": "boolean",
"index": "not_analyzed"
},
"archivedOn": {
"type": "date",
"index": "not_analyzed"
},
"failure": {
"type": "boolean",
"index": "not_analyzed"
},
"failureOn": {
"type": "date",
"index": "not_analyzed"
}
}
}
}
}
}
}
I may have a workaround using curl in a post-boot script but I would prefer to have the configuration handled in the config file.
Thanks!
It appears that elasticsearch will not allow all the configuration to be done in a single yml. The workaround I've found is to create an index template and place it in the <es-config>/templates/ dir then after spinning up the service I use curl to create the index. The index matching will catch it and provision it according to the template.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html