Suggest term search result not proper in elasticsearch - json

I am using elasticsearch 6.1 version and I want to use "Suggester" of its feature. I have dumped data in the format its required for suggester.
I have used these queries.
PUT /hotels
{
"mappings": {
"hotel" : {
"properties" : {
"name" : { "type" : "keyword" },
"city" : { "type" : "keyword" },
"name_suggest" : {
"type" : "completion"
}
}
}
}
}
put hotels/hotel/1
{
"name" : "Mercure Hotel Munich",
"city" : "Munich",
"name_suggest" : "Mercure Hotel Munich"
}
put /hotels/hotel/2
{
"name" : "Hotel Monaco",
"city" : "Munich",
"name_suggest" : "Hotel Monaco"
}
put /hotels/hotel/3
{
"name" : "Courtyard by Marriot Munich City",
"city" : "Munich",
"name_suggest" : "Courtyard by Marriot Munich City"
}
Then I fire my search query that is
Post http://localhost:9200/hotels/_search
{
"suggest": {
"name_suggest": {
"text": "h",
"completion": {
"field": "name_suggest"
}
}
}
}
I get the output result. It only return me the 2 data that is "Hotel Monaco" but it doesn't suggest me "Mercure Hotel Munich" it has hotel.
I want both in my suggest result set
I have tried with "prefix" as well.
Has anyone tried it. Please suggest me any solution to it.

Related

Creating Multiple QueueConfigurations in CloudFormation

I'm currently trying to write multiple QueueConfigurations into my CloudFormation template. Each is an SQS queue that is triggered when an object is created to a specified prefix. Here's what I have so far:
{
"Resources": {
"S3Bucket": {
"Type" : "AWS::S3::Bucket",
"Properties" :
"BucketName" : { "Ref" : "paramBucketName" },
"LoggingConfiguration" : {
"DestinationBucketName" : "test-bucket",
"LogFilePrefix" : { "Fn::Join": [ "", [ { "Ref": "paramBucketName" }, "/" ] ] }
},
"NotificationConfiguration" : {
"QueueConfigurations" : [{
"Id" : "1",
"Event" : "s3:ObjectCreated:*",
"Filter" : {
"S3Key" : {
"Rules" : {
"Name" : "prefix",
"Value" : "folder1/"
}
}
},
"Queue" : "arn:aws:sqs:us-east-1:958262988361:interstate-cdc_feeder_prod_hvr_dev"
}],
"QueueConfigurations" : [{
"Id" : "2",
"Event" : "s3:ObjectCreated:*",
"Filter" : {
"S3Key" : {
"Rules" : {
"Name" : "prefix",
"Value" : "folder2/"
}
}
},
"Queue" : "arn:aws:sqs:us-east-1:958262988361:interstate-latency_hvr_dev"
}]
}
}
}
}
}
}
I've encountered the error saying Encountered unsupported property Id. I thought that by defining the ID, I would be able to avoid the Duplicate object key error.
Does anyone know how to create multiple triggers in a single CloudFormation template? Thanks for the help in advance.
It should be structured like the below, There should only be one QueueConfigurations attribute
that contains all queue configurations within it. Also the Id parameter is not a valid property.
{
"Resources": {
"S3Bucket": {
"Type" : "AWS::S3::Bucket",
"Properties" :
"BucketName" : { "Ref" : "paramBucketName" },
"LoggingConfiguration" : {
"DestinationBucketName" : "test-bucket",
"LogFilePrefix" : { "Fn::Join": [ "", [ { "Ref": "paramBucketName" }, "/" ] ] }
},
"NotificationConfiguration" : {
"QueueConfigurations" : [{
"Event" : "s3:ObjectCreated:*",
"Filter" : {
"S3Key" : {
"Rules" : {
"Name" : "prefix",
"Value" : "folder1/"
}
}
},
"Queue" : "arn:aws:sqs:us-east-1:958262988361:interstate-cdc_feeder_prod_hvr_dev"
},
{
"Event" : "s3:ObjectCreated:*",
"Filter" : {
"S3Key" : {
"Rules" : {
"Name" : "prefix",
"Value" : "folder2/"
}
}
},
"Queue" : "arn:aws:sqs:us-east-1:958262988361:interstate-latency_hvr_dev"
}]
}
}
}
}
}
}
There is more information about QueueConfiguration in the documentation.

The `type` is not updating when creating or updating index

I'm new to Kibana and Elasticsearch. I have task to migrate data from our production site to staging. Currently, I have given a simple code on creating index.
I have successfully created index, but upon comparing to production site, the type declared as date became text on my new site. We have noticed that all types are converting to text and not sure is because we are using new version of kibana.
Here in production site...
"authorizationDate": {
"type": "date",
"ignore_malformed": true,
"format": "yyyy/MM/dd||yyyy-MM-dd"
},
This is how I implemented on staging site...
POST /orders/_doc/1
{
"order": {
"properties": {
"authorization": {
"authorizationDate": {
"type": "date",
"ignore_malformed": true,
"format": "yyyy/MM/dd||yyyy-MM-dd"
}
}
}
}
}
Upon checking...
GET orders?pretty
Output in orders mapping...
"mappings" : {
"properties" : {
"order" : {
"properties" : {
"properties" : {
"properties" : {
"authorization" : {
"properties" : {
"authorizationDate" : {
"properties" : {
"format" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"ignore_malformed" : {
"type" : "boolean"
},
"type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}
}
}
}
},
The type became text instead of date, and the date format is not recorded.
Thanks in advance.
POST /orders/_doc/1 -- This will create a new index named orders with a default inferred mapping.
When you are running above it is treating "order", "properties", "ignore_malformed" as fields not part of mapping. hence you can see multiple nested properties in output mapping below.
"properties" : {
"properties" : {
"properties" : {
To create a new mapping first you should run
PUT orders ---> create a new index named orders
{
"mappings": {
"properties": {
"authorization": {
"type": "object", --->should be object/nested was not present in your quest
"properties": {
"authorizationDate": {
"type": "date",
"ignore_malformed": true,
"format": "yyyy/MM/dd||yyyy-MM-dd"
}
}
}
}
}
}
Then adding a new doc using
POST /orders/_doc/1
{
"authorization":{
"authorizationDate":"2019-01-01"
}
}
will give below data
[
{
"_index" : "orders12",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"authorization" : {
"authorizationDate" : "2019-01-01"
}
}
}
]
Link for (Mapping)[https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html]

AWS gateway - 500 error as result of misalignment between payloads (‘method executions’) and desired resource model result

We are building a final resource tree for our AWS Gateway API comprised of 4 different JSON models, each a subset of the previous one. The goal here is to bring them all into one JSON payload and execute them together successfully.
The problem is knowing what the final JSON is going to look like – we’re close, but not entirely sure. When we go and execute via the “Method Execution” we get a 500 (for failed transformation). So, again, the goal is understanding how the final JSON transformation is supposed to look like.
The first object:
"type" : "object",
"properties" : {
"identifier" : {
"type" : "string",
"description" : "unique Identifier"
},
"templateData" : {
"$ref":"~path~/models/beneObjectTemplateData"
}
}
}
The second object is a subset of “beneObjectTemplateData”:
"properties" : {
"beneData" : {
"$ref":"~path~/models/beneData"
}
},
"description" : "Beneficiary specific data"
}
The third object is a subset of “beneData”:
"type" : "object",
"properties" : {
"beneSpecific" : {
"$ref":"~path~/models/beneSpecific"
}
},
"description" : "Beneficiary data"
}
And then the last object is a subset of “beneSpecific”:
{
"type" : "object",
"properties" : {
"name" : {
"type" : "string"
},
"classification" : {
"type" : "string",
"description" : "Type of beneficiary",
"pattern" : "^(Business|Individual)$"
},
"accountNumber" : {
"type" : "string",
"description" : "Required for Wires"
},
"localAccountNumber" : {
"type" : "string",
"description" : "Required for iACH"
}
}
}
This is what we wrote out for the final transformation, but it isn’t working:
"properties": {
"identifier": "timetrail",
"templateData": {
"beneData": {
"beneSpecific": {
"name": "john",
"classification": "Individual",
"accountnumber": "3243244",
"localAccountNumber": ""
}
}
}
}
}
So, we are wondering what is going wrong here.

How to format the TSV file in Druid

I am trying to load in a TSV in druid using this ingestion speck:
MOST UPDATED SPEC BELOW:
{
"type" : "index",
"spec" : {
"ioConfig" : {
"type" : "index",
"inputSpec" : {
"type": "local",
"baseDir": "quickstart",
"filter": "test_data.json"
}
},
"dataSchema" : {
"dataSource" : "local",
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "hour",
"queryGranularity" : "none",
"intervals" : ["2016-07-18/2016-07-22"]
},
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "json",
"dimensionsSpec" : {
"dimensions" : ["name", "email", "age"]
},
"timestampSpec" : {
"format" : "yyyy-MM-dd HH:mm:ss",
"column" : "date"
}
}
},
"metricsSpec" : [
{
"name" : "count",
"type" : "count"
},
{
"type" : "doubleSum",
"name" : "age",
"fieldName" : "age"
}
]
}
}
}
If my schema looks like this:
Schema: name email age
And actual dataset looks like this:
name email age Bob Jones 23 Billy Jones 45
Is this how the columns should be formatted^^ in the above dataset for a TSV? Like name email age should be first (the columns) and then the actual data. I am confused how Druid will know how to map the columns to the actual dataset in TSV format.
TSV stands for tab separated format, so it looks the same as csv but you will use tabs instead of commas e.g.
Name<TAB>Age<TAB>Address
Paul<TAB>23<TAB>1115 W Franklin
Bessy the Cow<TAB>5<TAB>Big Farm Way
Zeke<TAB>45<TAB>W Main St
you will use frist line as header to define your column names - so you can use "name", "age" or "email" in dimensions in your spec file
as for the gmt and utc, they are basically the same
There is no time difference between Greenwich Mean Time and
Coordinated Universal Time
first one is time zone, the other one is a time standard
btw don`t forget to include a column with some time value in your tsv file!!
so e.g. if you will have tsv file that looks like:
"name" "position" "office" "age" "start_date" "salary"
"Airi Satou" "Accountant" "Tokyo" "33" "2016-07-16T19:20:30+01:00" "162700"
"Angelica Ramos" "Chief Executive Officer (CEO)" "London" "47" "2016-07-16T19:20:30+01:00" "1200000"
your spec file should look like this:
{
"spec" : {
"ioConfig" : {
"inputSpec" : {
"type": "local",
"baseDir": "path_to_folder",
"filter": "name_of_the_file(s)"
}
},
"dataSchema" : {
"dataSource" : "local",
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "hour",
"queryGranularity" : "none",
"intervals" : ["2016-07-01/2016-07-28"]
},
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "tsv",
"dimensionsSpec" : {
"dimensions" : [
"position",
"age",
"office"
]
},
"timestampSpec" : {
"format" : "auto",
"column" : "start_date"
}
}
},
"metricsSpec" : [
{
"name" : "count",
"type" : "count"
},
{
"name" : "sum_sallary",
"type" : "longSum",
"fieldName" : "salary"
}
]
}
}
}

Sub-records in Avro with Morphlines

I'm trying to convert JSON into Avro using the kite-sdk morphline module. After playing around I'm able to convert the JSON into Avro using a simple schema (no complex data types).
Then I took it one step further and modified the Avro schema as displayed below (subrec.avsc). As you can see the schema consist of a subrecord.
As soon as I tried to convert the JSON to Avro using the morphlines.conf and the subrec.avsc it failed.
Somehow the JSON paths "/record_type[]/alert/action" are not translated by the toAvro function.
The morphlines.conf
morphlines : [
{
id : morphline1
importCommands : ["org.kitesdk.**"]
commands : [
# Read the JSON blob
{ readJson: {} }
{ logError { format : "record: {}", args : ["#{}"] } }
# Extract JSON
{ extractJsonPaths { flatten: false, paths: {
"/record_type[]/alert/action" : /alert/action,
"/record_type[]/alert/signature_id" : /alert/signature_id,
"/record_type[]/alert/signature" : /alert/signature,
"/record_type[]/alert/category" : /alert/category,
"/record_type[]/alert/severity" : /alert/severity
} } }
{ logError { format : "EXTRACTED THIS : {}", args : ["#{}"] } }
{ extractJsonPaths { flatten: false, paths: {
timestamp : /timestamp,
event_type : /event_type,
source_ip : /src_ip,
source_port : /src_port,
destination_ip : /dest_ip,
destination_port : /dest_port,
protocol : /proto,
} } }
# Create Avro according to schema
{ logError { format : "WE GO TO AVRO"} }
{ toAvro { schemaFile : /etc/flume/conf/conf.empty/subrec.avsc } }
# Create Avro container
{ logError { format : "WE GO TO BINARY"} }
{ writeAvroToByteArray { format: containerlessBinary } }
{ logError { format : "DONE!!!"} }
]
}
]
And the subrec.avsc
{
"type" : "record",
"name" : "Event",
"fields" : [ {
"name" : "timestamp",
"type" : "string"
}, {
"name" : "event_type",
"type" : "string"
}, {
"name" : "source_ip",
"type" : "string"
}, {
"name" : "source_port",
"type" : "int"
}, {
"name" : "destination_ip",
"type" : "string"
}, {
"name" : "destination_port",
"type" : "int"
}, {
"name" : "protocol",
"type" : "string"
}, {
"name": "record_type",
"type" : ["null", {
"name" : "alert",
"type" : "record",
"fields" : [ {
"name" : "action",
"type" : "string"
}, {
"name" : "signature_id",
"type" : "int"
}, {
"name" : "signature",
"type" : "string"
}, {
"name" : "category",
"type" : "string"
}, {
"name" : "severity",
"type" : "int"
}
] } ]
} ]
}
The output on { logError { format : "EXTRACTED THIS : {}", args : ["#{}"] } } I output the following:
[{
/record_type[]/alert / action = [allowed],
/record_type[]/alert / category = [],
/record_type[]/alert / severity = [3],
/record_type[]/alert / signature = [GeoIP from NL,
Netherlands],
/record_type[]/alert / signature_id = [88006],
_attachment_body = [{
"timestamp": "2015-03-23T07:42:01.303046",
"event_type": "alert",
"src_ip": "1.1.1.1",
"src_port": 18192,
"dest_ip": "46.231.41.166",
"dest_port": 62004,
"proto": "TCP",
"alert": {
"action": "allowed",
"gid": "1",
"signature_id": "88006",
"rev": "1",
"signature" : "GeoIP from NL, Netherlands ",
"category" : ""
"severity" : "3"
}
}],
_attachment_mimetype=[json/java + memory],
basename = [simple_eve.json]
}]
UPDATE 2017-06-22
you MUST populate the data in the structure in order for this to work, by using addValues or setValues
{
addValues {
micDefaultHeader : [
{
eventTimestampString : "2017-06-22 18:18:36"
}
]
}
}
after debugging the sources of morphline toAvro, it appears that the record is the first object to be evaluated, no matter what you put in your mappings structure.
the solution is quite simple, but unfortunately took a little extra time, eclipse, running the flume agent in debug mode, cloning the source code and lots of coffee.
here it goes.
my schema:
{
"type" : "record",
"name" : "co_lowbalance_event",
"namespace" : "co.tigo.billing.cboss.lowBalance",
"fields" : [ {
"name" : "dummyValue",
"type" : "string",
"default" : "dummy"
}, {
"name" : "micDefaultHeader",
"type" : {
"type" : "record",
"name" : "mic_default_header_v_1_0",
"namespace" : "com.millicom.schemas.root.struct",
"doc" : "standard millicom header definition",
"fields" : [ {
"name" : "eventTimestampString",
"type" : "string",
"default" : "12345678910"
} ]
}
} ]
}
morphlines file:
morphlines : [
{
id : convertJsonToAvro
importCommands : ["org.kitesdk.**"]
commands : [
{
readJson {
outputClass : java.util.Map
}
}
{
addValues {
micDefaultHeader : [{}]
}
}
{
logDebug { format : "my record: {}", args : ["#{}"] }
}
{
toAvro {
schemaFile : /home/asarubbi/Development/test/co_lowbalance_event.avsc
mappings : {
"micDefaultHeader" : micDefaultHeader
"micDefaultHeader/eventTimestampString" : eventTimestampString
}
}
}
{
writeAvroToByteArray {
format : containerlessJSON
codec : null
}
}
]
}
]
the magic lies here:
{
addValues {
micDefaultHeader : [{}]
}
}
and in the mappings:
mappings : {
"micDefaultHeader" : micDefaultHeader
"micDefaultHeader/eventTimestampString" : eventTimestampString
}
explanation:
inside the code the first field name that is evaluated is micDefaultHeader of type RECORD. as there's no way to specify a default value for a RECORD (logically correct), the toAvro code evaluates this, does not get any value configured in mappings and therefore it fails at it detects (wrongly) that the record is empty when it shouldn't.
however, taking a look at the code, you may see that it requires a Map object, containing no values to please the parser and continue to the next element.
so we add a map object using the addValues and fill it with an empty map [{}]. notice that this must match the name of the record that is causing you an empty value. in my case "micDefaultHeader"
feel free to comment if you have a better solution, as this looks like a "dirty fix"