Index a JSON file into elasticsearch command/mapping errors - json

I'm new to ELK and I want to import a JSON file into Elasticsearch. this is my file:
{
"news":{
"1":{
"_score":1.0,
"_index":"newsvit",
"_source":{
"content":" \u0641\u0647\u06cc\u0645\u0647 \u062d\u0633\u0646\u200c\u0645\u06cc\u0631\u06cc: \u0627\u06af\u0631\u0686\u0647 \u062f\u0631 \u0647\u06cc\u0627\u0647\u0648\u06cc \u0627\u0646\u062a\u062e\u0627\u0628\u0627\u062a \u0631\u06cc\u0627\u0633\u062a \u062c\u0645\u0647\u0648\u0631\u06cc\u060c \u0645\u0648\u0636\u0648\u0639\u06cc \u0645\u0627\u0646\u0646\u062f \u0645\u0639\u0631\u0641\u06cc \u06a9\u0627\u0646\u062f\u06cc\u062f\u0627\u0647\u0627\u06cc \u0634\u0648\u0631\u0627\u06cc \u0634\u0647\u0631 \u062f\u0631 \u062d\u0627\u0634\u06cc\u0647 \u0642\u0631\u0627\u0631 \u06af\u0631\u0641\u062a\u0647\u060c \u0627\u0645\u0627 \u0627\u0645\u0633\u0627\u0644 \u0628\u0647 \u0639\u0646\u0648\u0627\u0646 \u067e\u0646\u062c\u0645\u06cc\u0646 \u062f\u0648\u0631\u0647 \u0627\u0646\u062a\u062e\u0627\u0628 \u0627\u0639\u0636\u0627\u06cc \u0634\u0648\u0631\u0627\u06cc \u0634\u0647\u0631\u060c \u0627\u06cc\u0646 \u0631\u0648\u06cc\u062f\u0627\u062f \u0628\u0647 \u0646\u0633\u0628\u062a \u062f\u0648\u0631\u0647\u200c\u0647\u0627\u06cc \u0642\u0628\u0644\u060c \u0628\u06cc\u0634\u062a\u0631 \u0645\u0648\u0631\u062f \u062a\u0648\u062c\u0647 \u0648\u0627\u0642\u0639 \u0634\u062f\u0647. \u0627\u06cc\u0646 \u0627\u0642\u0628\u0627\u0644\u060c \u0686\u0647 \u0627\u0632 \u0633\u0648\u06cc \u0686\u0647\u0631\u0647\u200c\u0647\u0627\u06cc \u0645\u0637\u0631\u062d \u0628\u0631\u0627\u06cc \u062b\u0628\u062a \u0646\u0627\u0645 \u0648 \u0686\u0647 \u0627\u0632 \u0633\u0648\u06cc \u0645\u0631\u062f\u0645 \u0628\u0631\u0627\u06cc \u0645\u0634\u0627\u0631\u06a9\u062a \u062f\u0631 \u0627\u06cc\u0646 \u0631\u0648\u06cc\u062f\u0627\u062f\u060c \u0639\u0644\u062a\u200c\u0647\u0627\u06cc \u06af\u0648\u0646\u0627\u06af\u0648\u0646\u06cc \u0645\u06cc\u200c\u062a\u0648\u0627\u0646\u062f \u062f\u0627\u0634\u062a\u0647 \u0628\u0627\u0634\u062f \u06a9\u0647 \u062a\u0648\u062c\u0647 \u0628\u0647 \u0622\u0646\u060c \u0645\u06cc\u200c\u062a\u0648\u0627\u0646\u062f \u0631\u0627\u0647\u06af\u0634\u0627\u06cc \u0627\u0639\u0636\u0627\u06cc \u0631\u06",
"lead":"\u062c\u0627\u0645\u0639\u0647 > \u0634\u0647\u0631\u06cc - \u0645\u06cc\u0632\u06af\u0631\u062f\u06cc \u062f\u0631\u0628\u0627\u0631\u0647 \u0639\u0645\u0644\u06a9\u0631\u062f \u062f\u0648\u0631\u0647\u200c\u0647\u0627\u06cc \u06af\u0630\u0634\u062a\u0647 \u0634\u0648\u0631\u0627\u06cc \u0634\u0647\u0631\u060c \u0622\u0646\u0686\u0647 \u0627\u0639\u0636\u0627\u06cc \u062c\u062f\u06cc\u062f \u0628\u0627\u06cc\u062f \u0645\u062f \u0646\u0638\u0631 \u062f\u0627\u0634\u062a\u0647 \u0628\u0627\u0634\u0646\u062f \u0648 \u0647\u0645\u0686\u0646\u06cc\u0646 \u0645\u0627\u0647\u06cc\u062a \u0633\u06cc\u0627\u0633\u06cc \u0628\u0648\u062f\u0646 \u06cc\u0627 \u0646\u0628\u0648\u062f\u0646 \u0634\u0648\u0631\u0627\u06cc \u0634\u0647\u0631.",
"agency":"13",
"date_created":1494518193,
"url":"http://www.khabaronline.ir/(X(1)S(bud4wg3ebzbxv51mj45iwjtp))/detail/663749/society/urban",
"image":"uploads/2017/05/11/1589793661.jpg",
"category":"15"
},
"_type":"news",
"_id":"2981643"
},
"2": {
...
based on what I have learnt, at first, I tried to create a mapping system for it in DevTools of Kibana. I want to be able to perform queries and search on this file based on fields in _source, such as category, id and so on. this is my mapping:
PUT /main-news-test-data
{
"mappings": {
"properties": {
"_score": {"type":"integer"},
"_index": {"type":"keyword"},
"_type":{"type":"keyword"},
"_id":{"type":"keyword"}
},
"_source":{
"properties": {
"content":{"type":"text"},
"title":{"type":"text"},
"lead":{"type":"text"},
"agency":{"type":"keyword"},
"date_created":{"type":"date"},
"url":{"type":"keyword"},
"image":{"type":"keyword"},
"category":{"type":"keyword"}
}
}
}
}
HEAD main-news-test-data
GET /main-news-test-data/_search?q=*
but when I run this in Devtools I receive this error:
{
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "Mapping definition for [_source] has unsupported parameters: [properties : {image={type=keyword}, agency={type=keyword}, date_created={type=date}, title={type=text}, category={type=keyword}, content={type=text}, lead={type=text}, url={type=keyword}}]"
}
],
"type" : "mapper_parsing_exception",
"reason" : "Failed to parse mapping [_doc]: Mapping definition for [_source] has unsupported parameters: [properties : {image={type=keyword}, agency={type=keyword}, date_created={type=date}, title={type=text}, category={type=keyword}, content={type=text}, lead={type=text}, url={type=keyword}}]",
"caused_by" : {
"type" : "mapper_parsing_exception",
"reason" : "Mapping definition for [_source] has unsupported parameters: [properties : {image={type=keyword}, agency={type=keyword}, date_created={type=date}, title={type=text}, category={type=keyword}, content={type=text}, lead={type=text}, url={type=keyword}}]"
}
},
"status" : 400
}
I also tried to index my file into elasticsearch using this PowerShell command afterwards:
Invoke-RestMethod "http://localhost:9200/main-news-test-data/doc/_bulk?pretty" -Method Post -ContentType 'application/x-ndjson' -InFile "test.json"
but again I get this error from Powershell:
Invoke-RestMethod : {
"error" : {
"root_cause" : [
{
"type" : "json_e_o_f_exception",
"reason" : "Unexpected end-of-input: expected close marker for Object (start marker at [Source:
(org.elasticsearch.common.bytes.AbstractBytesReference$MarkSupportingStreamInputWrapper); line: 1, column: 1])\n at
[Source: (org.elasticsearch.common.bytes.AbstractBytesReference$MarkSupportingStreamInputWrapper); line: 2, column: 1]"
}
],
"type" : "json_e_o_f_exception",
"reason" : "Unexpected end-of-input: expected close marker for Object (start marker at [Source:
(org.elasticsearch.common.bytes.AbstractBytesReference$MarkSupportingStreamInputWrapper); line: 1, column: 1])\n at
[Source: (org.elasticsearch.common.bytes.AbstractBytesReference$MarkSupportingStreamInputWrapper); line: 2, column: 1]"
},
"status" : 400
}
So what should I do? how do I import a JSON file into elasticsearch that is queryable by fields?

with what I read I can say that :
your mapping is strange
Just put :
PUT /main-news-test-data
{
"mappings": {
"properties": {
"content": {
"type": "text"
},
"title": {
"type": "text"
},
"lead": {
"type": "text"
},
"agency": {
"type": "keyword"
},
"date_created": {
"type": "date"
},
"url": {
"type": "keyword"
},
"image": {
"type": "keyword"
},
"category": {
"type": "keyword"
}
}
}
}
Your Json is wrong. A bulk don't use a valid json.
A file for the _bulk api will look like this :
{ "index" : { "_index" : "main-news-test-data", "_id" : "1" } }
{ "field1" : "value1" }
{ "index" : { "_index" : "main-news-test-data", "_id" : "2" } }
{ "field1" : "value2" }
Please also Note that "_score":1.0, has no reason to be in your request and that _type is deprecated (if you use a 7.0+, _type can only be _doc and should be ignored)

Related

How to send below body request to hit service and get response

{
"version": 46,
"actions": [
{
"action" : "addCustomLineItem",
"name" : {
"en" : "Global India"
},
"quantity" : 1,
"money" : {
"currencyCode" : "INR",
"centAmount" : 4200
},
"slug" : "mySlug",
"taxCategory" : {
"typeId" : "tax-category",
"id" : "20f5a0ca-e3fd-48b6-8258-8dc8c75fe22a"
}
}
]
}
Above is my body request.
And we have stored above data in cartIdData variable. I am not getting response. showing it does not contain valid json format.
this.cartIdData = {
version:this.cartversion,
productId:this.productID,
currency:this.currencycode,
price:this.priceparseInt,
cartaction:action,
india:this.name,
slugname:this.slugname
};

Raise JSON Schema Validation Error when key does not exist in API Gateway

Having some issues trying to get to AWS API Gateway to error.
Validation errors raise correctly for keys that have been defined when provided the wrong type, but if I add a random key to the request body that has not been defined in the model then API Gateway just carries on processing the request and returns a 200 Success response.
How can I define, in the model, that random keys are not allowed?
Model
{
"type" : "object",
"properties" : {
"bool" : {
"type" : "boolean"
},
"string" : {
"type" : "string"
},
"int" : {
"type" : "integer",
"format" : "int32"
}
}
}
Request Body
API GW Correctly Gives 200 Response
{
"bool": true,
"string": "foo",
"int": 123
}
Invalid Request Body
API GW Correctly Gives 400 Response
{
"bool": "foo",
"string": 123,
"int": false
}
Invalid Request Body
API GW Incorrectly Gives 200 Response
{
"bool": true,
"string": "foo",
"int": 123,
"bar":""
}
Any ideas on what I'm missing in the JSON schema?
I needed to add the following to the model
"additionalProperties":false
Model
{
"type" : "object",
"additionalProperties":false,
"properties" : {
"bool" : {
"type" : "boolean"
},
"string" : {
"type" : "string"
},
"int" : {
"type" : "integer",
"format" : "int32"
}
}
}
from this issue
Only allow properties that are declared in JSON schema

Custom analyzer appearing in type mapping but not working in Elasticsearch

I'm trying to add a custom analyzer to my index while also mapping that analyzer to a property on a type. Here is my JSON object for doing this:
{ "settings" : {
"analysis" : {
"analyzer" : {
"test_analyzer" : {
"type" : "custom",
"tokenizer": "standard",
"filter" : ["lowercase", "asciifolding"],
"char_filter": ["html_strip"]
}
}
}
},
"mappings" : {
"test" : {
"properties" : {
"checkanalyzer" : {
"type" : "string",
"analyzer" : "test_analyzer"
}
}
}
}
}
I know this analyzer works because I've tested it using /wp2/_analyze?analyzer=test_analyzer -d '<p>Testing analyzer.</p>' and also it shows up as the analyzer for the checkanalyzer property when I check /wp2/test/_mapping. However, if I add a document like {"checkanalyzer": "<p>The tags should not show up</p>"}, the HTML tags don't get stripped out when I retrieve the document using the _search endpoint. Am I misunderstanding how the mapping works or is there something wrong with my JSON object? I'm dynamically creating the wp2 index and also the test type when I make this call to Elasticsearch, not sure if that matters.
The html doesn't get removed from the source, it gets removed from the terms generated by that source. You can see this if you use a terms aggregation:
POST /test_index/_search
{
"aggs": {
"checkanalyzer_field_terms": {
"terms": {
"field": "checkanalyzer"
}
}
}
}
{
"took": 77,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "test",
"_id": "1",
"_score": 1,
"_source": {
"checkanalyzer": "<p>The tags should not show up</p>"
}
}
]
},
"aggregations": {
"checkanalyzer_field_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "not",
"doc_count": 1
},
{
"key": "should",
"doc_count": 1
},
{
"key": "show",
"doc_count": 1
},
{
"key": "tags",
"doc_count": 1
},
{
"key": "the",
"doc_count": 1
},
{
"key": "up",
"doc_count": 1
}
]
}
}
}
Here's some code I used to test it:
http://sense.qbox.io/gist/2971767aa0f5949510fa0669dad6729bbcdf8570
Now if you want to completely strip out the html prior to indexing and storing the content as is, you can use the mapper attachment plugin - in which when you define the mapping, you can categorize the content_type to be "html."
The mapper attachment is useful for many things especially if you are handling multiple document types, but most notably - I believe just using this for the purpose of stripping out the html tags is sufficient enough (which you cannot do with the html_strip char filter).
Just a forewarning though - NONE of the html tags will be stored. So if you do need those tags somehow, I would suggest defining another field to store the original content. Another note: You cannot specify multifields for mapper attachment documents, so you would need to store that outside of the mapper attachment document. See my working example below.
You'll need to result in this mapping:
{
"html5-es" : {
"aliases" : { },
"mappings" : {
"document" : {
"properties" : {
"delete" : {
"type" : "boolean"
},
"file" : {
"type" : "attachment",
"fields" : {
"content" : {
"type" : "string",
"store" : true,
"term_vector" : "with_positions_offsets",
"analyzer" : "autocomplete"
},
"author" : {
"type" : "string",
"store" : true,
"term_vector" : "with_positions_offsets"
},
"title" : {
"type" : "string",
"store" : true,
"term_vector" : "with_positions_offsets",
"analyzer" : "autocomplete"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_millis"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
},
"content_length" : {
"type" : "integer"
},
"language" : {
"type" : "string"
}
}
},
"hash_id" : {
"type" : "string"
},
"path" : {
"type" : "string"
},
"raw_content" : {
"type" : "string",
"store" : true,
"term_vector" : "with_positions_offsets",
"analyzer" : "raw"
},
"title" : {
"type" : "string"
}
}
}
},
"settings" : { //insert your own settings here },
"warmers" : { }
}
}
Such that in NEST, I will assemble the content as such:
Attachment attachment = new Attachment();
attachment.Content = Convert.ToBase64String(File.ReadAllBytes("path/to/document"));
attachment.ContentType = "html";
Document document = new Document();
document.File = attachment;
document.RawContent = InsertRawContentFromString(originalText);
I have tested this in Sense - results are as follows:
"file": {
"_content": "PGh0bWwgeG1sbnM6TWFkQ2FwPSJodHRwOi8vd3d3Lm1hZGNhcHNvZnR3YXJlLmNvbS9TY2hlbWFzL01hZENhcC54c2QiPg0KICA8aGVhZCAvPg0KICA8Ym9keT4NCiAgICA8aDE+VG9waWMxMDwvaDE+DQogICAgPHA+RGVsZXRlIHRoaXMgdGV4dCBhbmQgcmVwbGFjZSBpdCB3aXRoIHlvdXIgb3duIGNvbnRlbnQuIENoZWNrIHlvdXIgbWFpbGJveC48L3A+DQogICAgPHA+wqA8L3A+DQogICAgPHA+YXNkZjwvcD4NCiAgICA8cD7CoDwvcD4NCiAgICA8cD4xMDwvcD4NCiAgICA8cD7CoDwvcD4NCiAgICA8cD5MYXZlbmRlci48L3A+DQogICAgPHA+wqA8L3A+DQogICAgPHA+MTAvNiAxMjowMzwvcD4NCiAgICA8cD7CoDwvcD4NCiAgICA8cD41IDA5PC9wPg0KICAgIDxwPsKgPC9wPg0KICAgIDxwPjExIDQ3PC9wPg0KICAgIDxwPsKgPC9wPg0KICAgIDxwPkhhbGxvd2VlbiBpcyBpbiBPY3RvYmVyLjwvcD4NCiAgICA8cD7CoDwvcD4NCiAgICA8cD5qb2c8L3A+DQogIDwvYm9keT4NCjwvaHRtbD4=",
"_content_length": 0,
"_content_type": "html",
"_date": "0001-01-01T00:00:00",
"_title": "Topic10"
},
"delete": false,
"raw_content": "<h1>Topic10</h1><p>Delete this text and replace it with your own content. Check your mailbox.</p><p> </p><p>asdf</p><p> </p><p>10</p><p> </p><p>Lavender.</p><p> </p><p>10/6 12:03</p><p> </p><p>5 09</p><p> </p><p>11 47</p><p> </p><p>Halloween is in October.</p><p> </p><p>jog</p>"
},
"highlight": {
"file.content": [
"\n <em>Topic10</em>\n\n Delete this text and replace it with your own content. Check your mailbox.\n\n  \n\n asdf\n\n  \n\n 10\n\n  \n\n Lavender.\n\n  \n\n 10/6 12:03\n\n  \n\n 5 09\n\n  \n\n 11 47\n\n  \n\n Halloween is in October.\n\n  \n\n jog\n\n "
]
}

Sub-records in Avro with Morphlines

I'm trying to convert JSON into Avro using the kite-sdk morphline module. After playing around I'm able to convert the JSON into Avro using a simple schema (no complex data types).
Then I took it one step further and modified the Avro schema as displayed below (subrec.avsc). As you can see the schema consist of a subrecord.
As soon as I tried to convert the JSON to Avro using the morphlines.conf and the subrec.avsc it failed.
Somehow the JSON paths "/record_type[]/alert/action" are not translated by the toAvro function.
The morphlines.conf
morphlines : [
{
id : morphline1
importCommands : ["org.kitesdk.**"]
commands : [
# Read the JSON blob
{ readJson: {} }
{ logError { format : "record: {}", args : ["#{}"] } }
# Extract JSON
{ extractJsonPaths { flatten: false, paths: {
"/record_type[]/alert/action" : /alert/action,
"/record_type[]/alert/signature_id" : /alert/signature_id,
"/record_type[]/alert/signature" : /alert/signature,
"/record_type[]/alert/category" : /alert/category,
"/record_type[]/alert/severity" : /alert/severity
} } }
{ logError { format : "EXTRACTED THIS : {}", args : ["#{}"] } }
{ extractJsonPaths { flatten: false, paths: {
timestamp : /timestamp,
event_type : /event_type,
source_ip : /src_ip,
source_port : /src_port,
destination_ip : /dest_ip,
destination_port : /dest_port,
protocol : /proto,
} } }
# Create Avro according to schema
{ logError { format : "WE GO TO AVRO"} }
{ toAvro { schemaFile : /etc/flume/conf/conf.empty/subrec.avsc } }
# Create Avro container
{ logError { format : "WE GO TO BINARY"} }
{ writeAvroToByteArray { format: containerlessBinary } }
{ logError { format : "DONE!!!"} }
]
}
]
And the subrec.avsc
{
"type" : "record",
"name" : "Event",
"fields" : [ {
"name" : "timestamp",
"type" : "string"
}, {
"name" : "event_type",
"type" : "string"
}, {
"name" : "source_ip",
"type" : "string"
}, {
"name" : "source_port",
"type" : "int"
}, {
"name" : "destination_ip",
"type" : "string"
}, {
"name" : "destination_port",
"type" : "int"
}, {
"name" : "protocol",
"type" : "string"
}, {
"name": "record_type",
"type" : ["null", {
"name" : "alert",
"type" : "record",
"fields" : [ {
"name" : "action",
"type" : "string"
}, {
"name" : "signature_id",
"type" : "int"
}, {
"name" : "signature",
"type" : "string"
}, {
"name" : "category",
"type" : "string"
}, {
"name" : "severity",
"type" : "int"
}
] } ]
} ]
}
The output on { logError { format : "EXTRACTED THIS : {}", args : ["#{}"] } } I output the following:
[{
/record_type[]/alert / action = [allowed],
/record_type[]/alert / category = [],
/record_type[]/alert / severity = [3],
/record_type[]/alert / signature = [GeoIP from NL,
Netherlands],
/record_type[]/alert / signature_id = [88006],
_attachment_body = [{
"timestamp": "2015-03-23T07:42:01.303046",
"event_type": "alert",
"src_ip": "1.1.1.1",
"src_port": 18192,
"dest_ip": "46.231.41.166",
"dest_port": 62004,
"proto": "TCP",
"alert": {
"action": "allowed",
"gid": "1",
"signature_id": "88006",
"rev": "1",
"signature" : "GeoIP from NL, Netherlands ",
"category" : ""
"severity" : "3"
}
}],
_attachment_mimetype=[json/java + memory],
basename = [simple_eve.json]
}]
UPDATE 2017-06-22
you MUST populate the data in the structure in order for this to work, by using addValues or setValues
{
addValues {
micDefaultHeader : [
{
eventTimestampString : "2017-06-22 18:18:36"
}
]
}
}
after debugging the sources of morphline toAvro, it appears that the record is the first object to be evaluated, no matter what you put in your mappings structure.
the solution is quite simple, but unfortunately took a little extra time, eclipse, running the flume agent in debug mode, cloning the source code and lots of coffee.
here it goes.
my schema:
{
"type" : "record",
"name" : "co_lowbalance_event",
"namespace" : "co.tigo.billing.cboss.lowBalance",
"fields" : [ {
"name" : "dummyValue",
"type" : "string",
"default" : "dummy"
}, {
"name" : "micDefaultHeader",
"type" : {
"type" : "record",
"name" : "mic_default_header_v_1_0",
"namespace" : "com.millicom.schemas.root.struct",
"doc" : "standard millicom header definition",
"fields" : [ {
"name" : "eventTimestampString",
"type" : "string",
"default" : "12345678910"
} ]
}
} ]
}
morphlines file:
morphlines : [
{
id : convertJsonToAvro
importCommands : ["org.kitesdk.**"]
commands : [
{
readJson {
outputClass : java.util.Map
}
}
{
addValues {
micDefaultHeader : [{}]
}
}
{
logDebug { format : "my record: {}", args : ["#{}"] }
}
{
toAvro {
schemaFile : /home/asarubbi/Development/test/co_lowbalance_event.avsc
mappings : {
"micDefaultHeader" : micDefaultHeader
"micDefaultHeader/eventTimestampString" : eventTimestampString
}
}
}
{
writeAvroToByteArray {
format : containerlessJSON
codec : null
}
}
]
}
]
the magic lies here:
{
addValues {
micDefaultHeader : [{}]
}
}
and in the mappings:
mappings : {
"micDefaultHeader" : micDefaultHeader
"micDefaultHeader/eventTimestampString" : eventTimestampString
}
explanation:
inside the code the first field name that is evaluated is micDefaultHeader of type RECORD. as there's no way to specify a default value for a RECORD (logically correct), the toAvro code evaluates this, does not get any value configured in mappings and therefore it fails at it detects (wrongly) that the record is empty when it shouldn't.
however, taking a look at the code, you may see that it requires a Map object, containing no values to please the parser and continue to the next element.
so we add a map object using the addValues and fill it with an empty map [{}]. notice that this must match the name of the record that is causing you an empty value. in my case "micDefaultHeader"
feel free to comment if you have a better solution, as this looks like a "dirty fix"

Elasticsearch - completion suggester payload transforming, returns invalid JSON

I am trying to use the elasticsearch completion suggester. I have app_user objects, which come into my elasticsearch instance via a couchdb river.
This is the mapping I use:
{
"app_user" : {
"_all" : {"enabled" : true},
"_source" : {
"includes" : [
"_id",
"_rev",
"type",
"profile.callname",
"profile.fullname",
"email"
]
},
"properties" : {
"suggest" : { "type" : "completion",
"index_analyzer" : "simple",
"search_analyzer" : "simple",
"payloads" : true
}
},
"transform" : [
{"script": "ctx._source.suggest = ['input':[ctx._source.email, ctx._source.profile.fullname, ctx._source.profile.callname]]"},
{"script": "ctx._source.suggest.payload = ['_id': ctx._source['_id'], 'type': ctx._source['type'], '_rev': ctx._source['_rev']]"}
,
{"script": "ctx._source.suggest.payload << ['label': ctx._source.profile.fullname, 'text': ctx._source.email]"}
]
}
}
So I am trying to include the object ID and a display text in the payload.
When I view the generated document via http://localhost:9200/myindex/app_user/<someid>?pretty&_source_transform, everything seems OK:
{
"_index": "myindex",
"_type": "app_user",
"_id": "<someid>",
"found": true,
"_source": {
"_rev": "2-dcd7b9d456e205d3e9d859fdc2c6a688",
"_id": "<someid>",
"email": "joni#example.org",
"suggest": {
"input": [
"joni#example.org",
...
],
"output": "joni surname - joni#example.org",
"payload": {
"_id": "<someid>",
"type": "app_user",
"_rev": "2-dcd7b9d456e205d3e9d859fdc2c6a688",
"label": "joni surname",
"text": "joni#example.org"
}
},
"type": "app_user",
"profile": {
"callname": "",
"fullname": "joni surname"
}
}
}
However, when I try to get the document via _suggest, elasticsearch API somehow breaks the JSON object payload:
curl -XGET "http://localhost:9200/myindex/_suggest" -d '{
"all-suggest": {
"text": "joni",
"completion": {
"field": "suggest"
}
}
}'
results in
{"_shards":{"total":5,"successful":5,"failed":0},"all-suggest":[{"text":"joni","offset":0,"length":5,"options":[{"text":"joni surname - joni#example.org","score":1.0,"payload"::)
�_id`<someid>�typeIapp_user�_reva2-dcd7b9d456e205d3e9d859fdc2c6a688�label�joni surname�textSjoni#example.org�}]}]}
which is definitely no valid JSON.. Any ideas?
This is actually a bug in elasticsearch. It was reported and acknowledged here and should be fixed shortly.