Elasticsearch : Default template does not detect date - json

I have a default template in place which looks like
PUT /_template/abtemp
{
"template": "abt*",
"settings": {
"index.refresh_interval": "5s",
"number_of_shards": 5,
"number_of_replicas": 1,
"index.codec": "best_compression"
},
"mappings": {
"_default_": {
"_all": {
"enabled": false
},
"_source": {
"enabled": true
},
"dynamic_templates": [
{
"message_field": {
"match": "message",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "analyzed",
"omit_norms": true,
"fielddata": {
"format": "disabled"
}
}
}
},
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "analyzed",
"omit_norms": true,
"fielddata": {
"format": "disabled"
},
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
]
}
}
}
the idea here is this
apply the template for all indices whose name matches abt*
Only analyze a string field if it is named message. All other string fields will be not_analyzed and will have a corresponding .raw field
now i try to index some data into this as
curl -s -XPOST hostName:port/indexName/_bulk --data-binary #myFile.json
and here is the file
{ "index" : { "_index" : "abtclm3","_type" : "test"} }
{ "FIELD1":1, "FIELD2":"2015-11-18 15:32:18"", "FIELD3":"MATTHEWS", "FIELD4":"GARY", "FIELD5":"", "FIELD6":"STARMX", "FIELD7":"AL", "FIELD8":"05/15/2010 11:30", "FIELD9":"05/19/2010 7:00", "FIELD10":"05/19/2010 23:00", "FIELD11":3275, "FIELD12":"LC", "FIELD13":"WIN", "FIELD14":"05/15/2010 11:30", "FIELD15":"LC", "FIELD16":"POTUS", "FIELD17":"WH", "FIELD18":"S GROUNDS", "FIELD19":"OFFICE", "FIELD20":"VISITORS", "FIELD21":"STATE ARRIVAL - MEXICO**", "FIELD22":"08/27/2010 07:00:00 AM +0000", "FIELD23":"MATTHEWS", "FIELD24":"GARY", "FIELD25":"", "FIELD26":"STARMX", "FIELD27":"AL", "FIELD28":"05/15/2010 11:30", "FIELD29":"05/19/2010 7:00", "FIELD30":"05/19/2010 23:00", "FIELD31":3275, "FIELD32":"LC", "FIELD33":"WIN", "FIELD34":"05/15/2010 11:30", "FIELD35":"LC", "FIELD36":"POTUS", "FIELD37":"WH", "FIELD38":"S GROUNDS", "FIELD39":"OFFICE", "FIELD40":"VISITORS", "FIELD41":"STATE ARRIVAL - MEXICO**", "FIELD42":"08/27/2010 07:00:00 AM +0000" }
note that there are a few fields, such as FIELD2 that should be classified as a date. Also, FIELD31 should be classified as long. So the indexing happens and when i look at the data i see that the numbers have been correctly classified but everything else has been put under string. How do i make sure that the fields that have timestamps get classified as dates?

You have a lot of date formats there. You need a template like this one:
{
"template": "abt*",
"settings": {
"index.refresh_interval": "5s",
"number_of_shards": 5,
"number_of_replicas": 1,
"index.codec": "best_compression"
},
"mappings": {
"_default_": {
"dynamic_date_formats":["dateOptionalTime||yyyy-mm-dd HH:mm:ss||mm/dd/yyyy HH:mm||mm/dd/yyyy HH:mm:ss aa ZZ"],
"_all": {
"enabled": false
},
"_source": {
"enabled": true
},
"dynamic_templates": [
{
"message_field": {
"match": "message",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "analyzed",
"omit_norms": true,
"fielddata": {
"format": "disabled"
}
}
}
},
{
"dates": {
"match": "*",
"match_mapping_type": "date",
"mapping": {
"type": "date",
"format": "dateOptionalTime||yyyy-mm-dd HH:mm:ss||mm/dd/yyyy HH:mm||mm/dd/yyyy HH:mm:ss aa ZZ"
}
}
},
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "analyzed",
"omit_norms": true,
"fielddata": {
"format": "disabled"
},
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
]
}
}
}
This probably doesn't cover all the formats you have in there, you need to add the remaining ones. The idea is to specify them under dynamic_date_formats separated by || and then to specify them, also, under the format field for the date field itself.
To get an idea on what you need to do to define them, please see this section of the documentation for builtin formats and this piece of documentation for any custom formats you'd plan on using.

Related

Can't understand this JSON schema from the Swish QR Code API

I'm trying to use an API but the documentation is really bad. I got this JSON schema but I don't understand it. What am I supposed to include in the request?
url: https://mpc.getswish.net/qrg-swish/api/v1/prefilled
I have tried this but it doesn't work:
{
"payee":{
"editable":{
"editable":"false"
},
"swishString":{
"value":"0721876507"
}
},
"size":600,
"border":20,
"transparent":false,
"format":"png"
}
Here's the JSON schema
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "Swish pre-filled qr code generator",
"description": "REST interface to get a QR code that the Swish app will interpret as a pre filled code",
"definitions": {
"editable": {
"description ": "Controls if user can modify this value in Swish app or not",
"type": "object",
"properties": {
"editable": {
"type": "boolean",
"default": false
}
}
},
"swishString": {
"type": "object",
"properties": {
"value": {
"type": "string",
"maxLength": 70
}
},
"required": [
"value"
]
},
"swishNumber": {
"type": "object",
"properties": {
"value": {
"type": "number"
}
},
"required": [
"value"
]
}
},
"type": "object",
"properties": {
"format": {
"enum": [
"jpg",
"png",
"svg"
]
},
"payee": {
"description": "Payment receiver",
"allOf": [
{
"$ref": "#/definitions/editable"
},
{
"$ref": "#/definitions/swishString"
}
]
},
"amount": {
"description": "Payment amount",
"allOf": [
{
"$ref": "#/definitions/editable"
},
{
"$ref": "#/definitions/swishNumber"
}
]
},
"message": {
"description": "Message for payment",
"allOf": [
{
"$ref": "#/definitions/editable"
},
{
"$ref": "#/definitions/swishString"
}
]
},
"size": {
"description": "Size of the QR code. The code is a square, so width and height are the same. Not required is the format is svg",
"value": "number",
"minimum": 300
},
"border": {
"description": "Width of the border.",
"type": "number"
},
"transparent": {
"description": "Select background color to be transparent. Do not work with jpg format.",
"type": "boolean"
}
},
"required": [
"format"
],
"anyOf": [
{
"required": [
"payee"
]
},
{
"required": [
"amount"
]
},
{
"required": [
"message"
]
}
],
"additionalProperties": false,
"maxProperties": 5
}
The API should return a QR code.
To be honest, I have not taken the time to learn JSON schema, but your example should probably look something like this:
{
"payee": {
"value": "0721876507",
"editable": false
},
"size": 600,
"border": 20,
"transparent": false,
"format": "png"
}
There are other parameters you may choose to utilize:
{
"payee": {
"value": "1239006032",
"editable": false
},
"message": {
"value": "LIV",
"editable": true
},
"amount": {
"value": 100,
"editable": true
},
"format": "png",
"size": 300,
"border": 0,
"transparent": true
}
Honestly, I think the developers behind the Swish APIs are trying to look smart by complicating things. They should, of course, have provided example JSON data instead of forcing consumers to understand their JSON schema. Also, I believe their published schema is wrong. The second example I provided works even though it doesn't validate according to the JSON schema ("Object property count 7 exceeds maximum count of 5").
Here is a minimal and pretty useless request that returns a valid QR-code
{
"format": "png",
"size": 300
}
And here is a more usable example that works
{
"format": "png",
"size": 300,
"transparent": false,
"amount": {
"value": 999.99,
"editable": true
},
"payee": {
"value": "0701000000",
"editable": false
},
"message": {
"value": "Hello",
"editable": false
}
}

How can I explictly constrain multiple items in a JSON Schema array?

I am creating a JSON schema and want to define an array containing only exact matches for certain items:
An example of the sort of JSON (snippet) would look like:
{
"results":
[
{ "id": 1, "test": true, "volts": 700, "duration": 100 },
{ "id": 2, "test": false }
]
}
This seems to be a combination of OneOf and "additionalProperties": false but I can't work out how that should be used. So far I have:
{
"results":
{
"type": "array",
"items":
{
"type": "object",
"OneOf":
[
{
"id": { "type": "integer" },
"test": { "type": "boolean" },
"volts": { "type": "integer" },
"duration": { "type": "integer" }
},
{
"id": { "type": "integer" },
"test": { "type": "boolean" }
}
],
"additionalProperties": false
}
}
}
I'm using http://www.jsonschemavalidator.net/ to check my JSON.
But when I validate the following JSON against my schema it says it's valid; is the website incorrect or have I done something wrong?
{
"results": [
{
"fred": 7,
"id": 7,
"test": true,
"volts": 7,
"duration": 7
},
{
"fish": 7
}
]
}

Logstash + Elasticsearch template mapping fails to add to Elasticsearch

I'm trying to add a custom template for all logstash indexes in elasticsearch, however whenever I add one, logstash raises a 400 error on all the logs and fails to add anything to elasticsearch.
I'm adding the template using the REST API for elasticsearch:
POST _template/logstash
{
"order": 0,
"template" : "logstash*",
"settings": {
"index.refresh_interval": "5s"
},
"mappings": {
"_default_": {
"_all" : {
"enabled" : true,
"omit_norms": true
},
"dynamic_templates": [
{
"message_field": {
"mapping": {
"index": "analyzed",
"omit_norms": true,
"type": "string"
},
"match_mapping_type": "string",
"match": "message"
}
},
{
"string_fields": {
"mapping": {
"index": "analyzed",
"omit_norms": true,
"type": "string",
"fields": {
"raw": {
"ignore_above": 256,
"index": "not_analyzed",
"type": "string"
}
}
},
"match_mapping_type": "string",
"match": "*"
}
}
],
"properties": {
"geoip": {
"dynamic": true,
"type": "object",
"properties": {
"location": {
"type": "geo_point"
}
}
},
"#version": {
"index": "not_analyzed",
"type": "string"
},
"#fields": {
"type": "object",
"dynamic": true,
"path": "full"
},
"#message": {
"type": "string",
"index": "analyzed"
},
"#source": {
"type": "string",
"index": "not_analyzed"
},
"method": {
"type": "string",
"index": "not_analyzed"
},
"requested": {
"type": "date",
"format": "dateOptionalTime",
"index": "not_analyzed"
},
"response_time": {
"type": "float",
"index": "not_analyzed"
},
"hostname": {
"type": "string",
"index": "not_analyzed"
},
"ip": {
"type": "string",
"index": "not_analyzed"
},
"error": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
you should try to add the template using logstash instead of using the rest api directly.
In your logstash configuration:
output {
elasticsearch {
# add additional configurations appropriately
template => # path to the template file you want to use
template_name => "logstash"
template_overwrite => true
}
}

configure an elasticsearch index with json not taking

I'm using the following json to configure elasticsearch. The goal is to set up the index and the type in one swoop (this is the requirement, setting up docker images). This is as far as I've gotten that will allow elasticsearch to start successfully. The problem is that the index isn't created yet it doesn't error. Other forms I've tried prevents the service from starting.
{
"cluster": {
"name": "MyClusterName"
},
"node": {
"name": "MyNodeName"
},
"indices": {
"number_of_shards": 4,
"index.number_of_replicas": 4
},
"index": {
"analysis": {
"analyzer": {
"my_ngram_analyzer": {
"tokenizer": "my_ngram_tokenizer",
"filter": "lowercase"
},
"my_lowercase_whitespace_analyzer": {
"tokenizer": "whitespace",
"filter": "lowercase"
}
},
"tokenizer": {
"my_ngram_tokenizer": {
"type": "nGram",
"min_gram": "2",
"max_gram": "20"
}
}
},
"index": {
"settings": {
"_id": "indexindexer"
},
"mappings": {
"inventoryIndex": {
"_id": {
"path": "indexName"
},
"_routing": {
"required": true,
"path": "indexName"
},
"properties": {
"indexName": {
"type": "string",
"index": "not_analyzed"
},
"startedOn": {
"type": "date",
"index": "not_analyzed"
},
"deleted": {
"type": "boolean",
"index": "not_analyzed"
},
"deletedOn": {
"type": "date",
"index": "not_analyzed"
},
"archived": {
"type": "boolean",
"index": "not_analyzed"
},
"archivedOn": {
"type": "date",
"index": "not_analyzed"
},
"failure": {
"type": "boolean",
"index": "not_analyzed"
},
"failureOn": {
"type": "date",
"index": "not_analyzed"
}
}
}
}
}
}
}
I may have a workaround using curl in a post-boot script but I would prefer to have the configuration handled in the config file.
Thanks!
It appears that elasticsearch will not allow all the configuration to be done in a single yml. The workaround I've found is to create an index template and place it in the <es-config>/templates/ dir then after spinning up the service I use curl to create the index. The index matching will catch it and provision it according to the template.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html

Elastic Search mapping bad mapping

I have the following mapping for a Elastic Search index. I am posting ("PUT") it to http://abc.com/test/article/_mapping.
{
"article": {
"settings": {
"analysis": {
"analyzer": {
"stem": {
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"stop",
"porter_stem"
]
}
}
}
},
"mappings": {
"properties": {
"DocumentID": {
"type": "string"
},
"ContentSource": {
"type": "integer"
},
"ContentType": {
"type": "integer"
},
"PageTitle": {
"type": "string",
"analyzer": "stem"
},
"ContentBody": {
"type": "string",
"analyzer": "stem"
},
"URL": {
"type": "string"
}
}
}
}
}
I get an OK message from Elastic Search. But when I go to http://abc.com/test/article/_mapping , I don't see the settings of the mapping. All I see is this
{ "article" : { "properties" : { } }}
I had this working before I added the settings portion for the analyzer. Any help is appreciated!
I figured it out. The first "article" string needs to be deleted. And the "PUT" should happen against http://abc.com/index