How to write Elasticsearch multiple must scripts query? - json

I want to use a query to compare multiple fields. I have field 1 to 4. I want to search data which field 1 is greater than field 2 and below query is work perfectly;
{
"size": 0,
"_source": [
"field1",
"field2",
"field3",
"field4"
],
"sort": [],
"query": {
"bool": {
"filter": [],
"must": {
"script": {
"script": {
"inline": "doc['field1'].value > doc['field2'].value;",
"lang": "painless"
}
}
}
}
}
}
Now, I want to search data which field 1 is greater than field 2 and also which field 3 is greater than field 4. according Elastic Search: How to write multi statement scripts? and This link I just need to separate each statement with a semicolon. So it should be like this:
{
"size": 0,
"_source": [
"field1",
"field2",
"field3",
"field4"
],
"sort": [],
"query": {
"bool": {
"filter": [],
"must": {
"script": {
"script": {
"inline": "doc['field1'].value > doc['field2'].value; doc['field3'].value > doc['field4'].value;",
"lang": "painless"
}
}
}
}
}
}
But that query doesn't work and return compile error like this:
{"root_cause":[{"type":"script_exception","reason":"compile
error","script_stack":["doc['field1'].value > doc[' ...","^----
HERE"],"script":"doc['field1'].value > doc['field2'].value;
doc['field1'].value > doc['field2'].value;
","lang":"painless"}],"type":"search_phase_execution_exception","reason":"all
shards
failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"financials","node":"8SXaM2HcStelpLHvTDSMCQ","reason":{"type":"query_shard_exception","reason":"failed
to create query: {\n \"bool\" : {\n \"must\" : [\n {\n \"script\" :
{\n \"script\" : {\n \"source\" : \"doc['field1'].value >
doc['field2'].value; doc['field1'].value > doc['field2'].value; \",\n
\"lang\" : \"painless\"\n },\n \"boost\" : 1.0\n }\n }\n ],\n
\"adjust_pure_negative\" : true,\n \"boost\" : 1.0\n
}\n}","index_uuid":"hz12cHg1SkGwq712n6BUIA","index":"financials","caused_by":{"type":"script_exception","reason":"compile
error","script_stack":["doc['field1'].value > doc[' ...","^----
HERE"],"script":"doc['field1'].value > doc['field2'].value;
doc['field1'].value > doc['field2'].value;
","lang":"painless","caused_by":{"type":"illegal_argument_exception","reason":"Not
a statement."}}}}]}

You need to combine your two conditions like this:
doc['field1'].value > doc['field2'].value && doc['field3'].value > doc['field4'].value
^
|
replace the semicolon by &&

In order to use more than condition, 'must', 'should' and 'must_not' can be use as arrays, and each condition become on element of it. According to Elasticsearch documentation
"query": {
"bool" : {
"must" : {
"term" : { "user" : "kimchy" }
},
"filter": {
"term" : { "tag" : "tech" }
},
"must_not" : {
"range" : {
"age" : { "gte" : 10, "lte" : 20 }
}
},
"should" : [
{ "term" : { "tag" : "wow" } },
{ "term" : { "tag" : "elasticsearch" } },
{ "term" : { "tag" : "and so on" } }
],
"minimum_should_match" : 1,
"boost" : 1.0
}
}

Related

how can I bulk index a json file into elasticsearch?

I am doing the works of "Shakespeare tutorial" for elasticsearch. The elasticsearch instance is set up on AWS, like so:
{
"_shards" : {
"total" : 40,
"successful" : 20,
"failed" : 0
},
"_all" : {
"primaries" : { },
"total" : { }
},
"indices" : {
"rentalyield" : {
"primaries" : { },
"total" : { }
},
"yield" : {
"primaries" : { },
"total" : { }
},
"r" : {
"primaries" : { },
"total" : { }
},
"shakespeare" : {
"primaries" : { },
"total" : { }
}
}
}
I have created a map as follows:
curl -XPUT http://localhost:9200/shakespeare -d '
{
"mappings" : {
"_default_" : {
"properties" : {
"speaker" : {"type": "string", "index" : "not_analyzed" },
"play_name" : {"type": "string", "index" : "not_analyzed" },
"line_id" : { "type" : "integer" },
"speech_number" : { "type" : "integer" }
}
}
}
}
';
The shakespeare.json file is on my local machine and also on S3 on AWS, so I have tried:
curl -XPUT localhost:9200/_bulk --data-binary #shakespeare.json
which fails with this error message:
Warning: Couldn't read data from file "shakespeare.json", this makes an empty
Warning: POST.
{"error":{"root_cause":[{"type":"parse_exception","reason":"Failed to derive xcontent"}],"type":"parse_exception","reason":"Failed to derive xcontent"},"status":400}
and I have also tried:
curl -XPOST localhost:9200/shakespeare/type/_bulk?pretty=true --data-binary #https://s3-us-west-2.amazonaws.com/richjson/shakespeare.json
which produces the same error.
How can I do this correctly?

Custom analyzer appearing in type mapping but not working in Elasticsearch

I'm trying to add a custom analyzer to my index while also mapping that analyzer to a property on a type. Here is my JSON object for doing this:
{ "settings" : {
"analysis" : {
"analyzer" : {
"test_analyzer" : {
"type" : "custom",
"tokenizer": "standard",
"filter" : ["lowercase", "asciifolding"],
"char_filter": ["html_strip"]
}
}
}
},
"mappings" : {
"test" : {
"properties" : {
"checkanalyzer" : {
"type" : "string",
"analyzer" : "test_analyzer"
}
}
}
}
}
I know this analyzer works because I've tested it using /wp2/_analyze?analyzer=test_analyzer -d '<p>Testing analyzer.</p>' and also it shows up as the analyzer for the checkanalyzer property when I check /wp2/test/_mapping. However, if I add a document like {"checkanalyzer": "<p>The tags should not show up</p>"}, the HTML tags don't get stripped out when I retrieve the document using the _search endpoint. Am I misunderstanding how the mapping works or is there something wrong with my JSON object? I'm dynamically creating the wp2 index and also the test type when I make this call to Elasticsearch, not sure if that matters.
The html doesn't get removed from the source, it gets removed from the terms generated by that source. You can see this if you use a terms aggregation:
POST /test_index/_search
{
"aggs": {
"checkanalyzer_field_terms": {
"terms": {
"field": "checkanalyzer"
}
}
}
}
{
"took": 77,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "test",
"_id": "1",
"_score": 1,
"_source": {
"checkanalyzer": "<p>The tags should not show up</p>"
}
}
]
},
"aggregations": {
"checkanalyzer_field_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "not",
"doc_count": 1
},
{
"key": "should",
"doc_count": 1
},
{
"key": "show",
"doc_count": 1
},
{
"key": "tags",
"doc_count": 1
},
{
"key": "the",
"doc_count": 1
},
{
"key": "up",
"doc_count": 1
}
]
}
}
}
Here's some code I used to test it:
http://sense.qbox.io/gist/2971767aa0f5949510fa0669dad6729bbcdf8570
Now if you want to completely strip out the html prior to indexing and storing the content as is, you can use the mapper attachment plugin - in which when you define the mapping, you can categorize the content_type to be "html."
The mapper attachment is useful for many things especially if you are handling multiple document types, but most notably - I believe just using this for the purpose of stripping out the html tags is sufficient enough (which you cannot do with the html_strip char filter).
Just a forewarning though - NONE of the html tags will be stored. So if you do need those tags somehow, I would suggest defining another field to store the original content. Another note: You cannot specify multifields for mapper attachment documents, so you would need to store that outside of the mapper attachment document. See my working example below.
You'll need to result in this mapping:
{
"html5-es" : {
"aliases" : { },
"mappings" : {
"document" : {
"properties" : {
"delete" : {
"type" : "boolean"
},
"file" : {
"type" : "attachment",
"fields" : {
"content" : {
"type" : "string",
"store" : true,
"term_vector" : "with_positions_offsets",
"analyzer" : "autocomplete"
},
"author" : {
"type" : "string",
"store" : true,
"term_vector" : "with_positions_offsets"
},
"title" : {
"type" : "string",
"store" : true,
"term_vector" : "with_positions_offsets",
"analyzer" : "autocomplete"
},
"name" : {
"type" : "string"
},
"date" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_millis"
},
"keywords" : {
"type" : "string"
},
"content_type" : {
"type" : "string"
},
"content_length" : {
"type" : "integer"
},
"language" : {
"type" : "string"
}
}
},
"hash_id" : {
"type" : "string"
},
"path" : {
"type" : "string"
},
"raw_content" : {
"type" : "string",
"store" : true,
"term_vector" : "with_positions_offsets",
"analyzer" : "raw"
},
"title" : {
"type" : "string"
}
}
}
},
"settings" : { //insert your own settings here },
"warmers" : { }
}
}
Such that in NEST, I will assemble the content as such:
Attachment attachment = new Attachment();
attachment.Content = Convert.ToBase64String(File.ReadAllBytes("path/to/document"));
attachment.ContentType = "html";
Document document = new Document();
document.File = attachment;
document.RawContent = InsertRawContentFromString(originalText);
I have tested this in Sense - results are as follows:
"file": {
"_content": "PGh0bWwgeG1sbnM6TWFkQ2FwPSJodHRwOi8vd3d3Lm1hZGNhcHNvZnR3YXJlLmNvbS9TY2hlbWFzL01hZENhcC54c2QiPg0KICA8aGVhZCAvPg0KICA8Ym9keT4NCiAgICA8aDE+VG9waWMxMDwvaDE+DQogICAgPHA+RGVsZXRlIHRoaXMgdGV4dCBhbmQgcmVwbGFjZSBpdCB3aXRoIHlvdXIgb3duIGNvbnRlbnQuIENoZWNrIHlvdXIgbWFpbGJveC48L3A+DQogICAgPHA+wqA8L3A+DQogICAgPHA+YXNkZjwvcD4NCiAgICA8cD7CoDwvcD4NCiAgICA8cD4xMDwvcD4NCiAgICA8cD7CoDwvcD4NCiAgICA8cD5MYXZlbmRlci48L3A+DQogICAgPHA+wqA8L3A+DQogICAgPHA+MTAvNiAxMjowMzwvcD4NCiAgICA8cD7CoDwvcD4NCiAgICA8cD41IDA5PC9wPg0KICAgIDxwPsKgPC9wPg0KICAgIDxwPjExIDQ3PC9wPg0KICAgIDxwPsKgPC9wPg0KICAgIDxwPkhhbGxvd2VlbiBpcyBpbiBPY3RvYmVyLjwvcD4NCiAgICA8cD7CoDwvcD4NCiAgICA8cD5qb2c8L3A+DQogIDwvYm9keT4NCjwvaHRtbD4=",
"_content_length": 0,
"_content_type": "html",
"_date": "0001-01-01T00:00:00",
"_title": "Topic10"
},
"delete": false,
"raw_content": "<h1>Topic10</h1><p>Delete this text and replace it with your own content. Check your mailbox.</p><p> </p><p>asdf</p><p> </p><p>10</p><p> </p><p>Lavender.</p><p> </p><p>10/6 12:03</p><p> </p><p>5 09</p><p> </p><p>11 47</p><p> </p><p>Halloween is in October.</p><p> </p><p>jog</p>"
},
"highlight": {
"file.content": [
"\n <em>Topic10</em>\n\n Delete this text and replace it with your own content. Check your mailbox.\n\n  \n\n asdf\n\n  \n\n 10\n\n  \n\n Lavender.\n\n  \n\n 10/6 12:03\n\n  \n\n 5 09\n\n  \n\n 11 47\n\n  \n\n Halloween is in October.\n\n  \n\n jog\n\n "
]
}

Elastic query to show exact match OR other fields if not found

I need some help rewriting my elasticsearch query.
What i need is:
1- to show a single record if there is an exact match on the two fields verb and sessionid.raw (partial matches are not accepted).
"must": [
{ "match" : { "verb" : "login" } },
{ "term" : { "sessionid.raw" : strSessionID } },
]
OR
2- to show the top 5 records (sorted by _score DESC and #timestamp ASC) that match some other fields, giving a boost if the records are between the specified time range.
"must": [
{ "match" : { "verb" : "login" } },
{ "term" : { "pid" : strPID } },
],
"should": [
{ "match" : { "user.raw" : strUser } },
{ "range" : { "#timestamp" : {
"from" : QueryFrom,
"to" : QueryTo,
"format" : DateFormatElastic,
"time_zone" : "America/Sao_Paulo",
"boost" : 2 }
} },
]
The code below is almost doing what i want.
Right now it boosts sessionid.raw to the top if found, but the remaining records are not being discarded.
var objQueryy = {
"fields" : [ "#timestamp", "program", "pid", "sessionid.raw", "user", "frontendip", "frontendname", "_score" ],
"size" : ItemsPerPage,
"sort" : [ { "_score" : { "order": "desc" } }, { "#timestamp" : { "order" : "asc" } } ],
"query" : {
"bool": {
"must": [
{ "match" : { "verb" : "login" } },
{ "term" : { "pid" : strPID } },
{ "bool": {
"should": [
{ "match" : { "user.raw" : strUser } },
{ "match" : { "sessionid.raw": { "query": strSessionID, "boost" : 3 } } },
{ "range" : { "#timestamp" : { "from": QueryFrom, "to": QueryTo, "format": DateFormatElastic, "time_zone": "America/Sao_Paulo" } } },
],
}},
],
},
},
}
Elasticsearch cannot "prune" your secondary results for you when an exact match is also found.
You would have to implement this discarding functionality on the client side after all results had been returned.
You may find the cleanest implementation is to execute your two search strategies separately. Your search client would:
Run the first (exact match) query
Run the second (expanded) query only if no results found

Conversion from sql to elastic search query

I want to convet the foll. sql query to elastic json query
select count(distinct(fk_id)),city_id from table
where status1 != "xyz" and satus2 = "abc" and
cr_date >="date1" and cr_date<="date2" group by city_id
Also is there any way of writing nested queries in elastic.
select * from table where status in (select status from table2)
The first query can be translated like this in the Elasticsearch query DSL:
curl -XPOST localhost:9200/table/_search -d '{
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"status2": "abc"
}
},
{
"range": {
"cr_date": {
"gt": "date1", <--- don't forget to change the date
"lt": "date2" <--- don't forget to change the date
}
}
}
],
"must_not": [
{
"term": {
"status1": "xyz"
}
}
]
}
}
}
},
"aggs": {
"by_cities": {
"terms": {
"field": "city_id"
},
"aggs": {
"fk_count": {
"cardinality": {
"field": "fk_id"
}
}
}
}
}
}'
Using Sql API In Elastic search, we can write queries and also we can translate them to elastic query
POST /_sql/translate
{
"query": "SELECT * FROM customer where address.Street='JanaChaitanya Layout' and Name='Pavan Kumar'"
}
Response for this is
{
"size" : 1000,
"query" : {
"bool" : {
"must" : [
{
"term" : {
"address.Street.keyword" : {
"value" : "JanaChaitanya Layout",
"boost" : 1.0
}
}
},
{
"term" : {
"Name.keyword" : {
"value" : "Pavan Kumar",
"boost" : 1.0
}
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
},
"_source" : {
"includes" : [
"Name",
"address.Area",
"address.Street"
],
"excludes" : [ ]
},
"docvalue_fields" : [
{
"field" : "Age"
}
],
"sort" : [
{
"_doc" : {
"order" : "asc"
}
}
]
}
Now we can use this result to query elastic search
For further details please go through this article
https://xyzcoder.github.io/elasticsearch/2019/06/25/making-use-of-sql-rest-api-in-elastic-search-to-write-queries-easily.html

elasticsearch request an element in an array

I have a document indexed in my elastic search like:
{
...
purchase:{
zones: ["FR", "GB"]
...
}
...
}
I use this kind of query to find for example document with puchase's zone to GB
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"purchase.zones": "GB"
}
}
}
}
}
But with it i get no results...
I would like to perform a query like in php in_array("GB", purchase.zones).
Any help would be very helpful.
If your "purchase" field is nested type then you have to use nested query to access the "zones".
{
"nested" : {
"path" : "obj1",
"score_mode" : "avg",
"query" : {
"bool" : {
"must" : [
{
"match" : {"obj1.name" : "blue"}
},
{
"range" : {"obj1.count" : {"gt" : 5}}
}
]
}
}
}
}
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html