Elasticsearch Query DSL - json

I get log files from my firewall which i want to filter for several strings.
However the string contains always some other information. So i want to filter the whole string for some specific words which are always in the string: "User" "authentication" "failed.
I tried this but i do not get any data from it:
"query": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"gt": "now-15m"
}
}
},
{
"query_string": {
"query": "User AND authentication AND failed"
}
}
]
}
}
}
However i cannot find the syntax for specific filtering words in strings. Hopefully some of you can help me.
This is the message log ( i want to filter "event.original"): Screenshot

Related

How to call stored painless script function in elastisearch

I am trying to use an example from
https://www.elastic.co/guide/en/elasticsearch/reference/6.4/modules-scripting-using.html
I have created a function and saved it.
POST http://localhost:9200/_scripts/calculate-score
{
"script": {
"lang": "painless",
"source": "ctx._source.added + params.my_modifier"
}
}
Try to call saved function
POST http://localhost:9200/users/user/_search
{
"query": {
"script": {
"script": {
"id": "calculate-score",
"params": {
"my_modifier": 2
}
}
}
}
}
And it returns an error: Variable [ctx] is not defined. I tried to use doc['added'] but received the same error. Please help me understand how to call the function.
You should try using doc['added'].value, let me explain you why and how. In short, because painless scripting language is rather simple but obscure.
Why can't ES find ctx variable?
The reason it cannot find ctx variable is because this painless script runs in "filter context" and such variable is not available in filter context. (If you are curious, there were 18 types of painless context as of ES 6.4).
In filter context there are only two variables available:
params (Map, read-only)
User-defined parameters passed in as part of the query.
doc (Map, read-only)
Contains the fields of the current document where each field is a List of values.
It should be enough to use doc['added'].value in your case:
POST /_scripts/calculate-score
{
"script": {
"lang": "painless",
"source": "doc['added'].value + params.my_modifier"
}
}
Should, because there will be another problem if we try to execute it (exactly like you did):
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"doc['added'].value + params.my_modifier",
"^---- HERE"
],
"script": "calculate-score",
"lang": "painless",
"caused_by": {
"type": "class_cast_exception",
"reason": "cannot cast def [long] to boolean"
}
Because of its context, this script is expected to return a boolean:
Return
boolean
Return true if the current document should be returned as a
result of the query, and false otherwise.
At this point we can understand why the script you were trying to execute did not make much sense for Elasticsearch: it is supposed to tell if a document matches a script query or not. If a script returns an integer, Elasticsearch wouldn't know if it is true or false.
How to make a stored script work in filter context?
As an example we can use the following script:
POST /_scripts/calculate-score1
{
"script": {
"lang": "painless",
"source": "doc['added'].value > params.my_modifier"
}
}
Now we can access the script:
POST /users/user/_search
{
"query": {
"script": {
"script": {
"id": "calculate-score1",
"params": {
"my_modifier": 2
}
}
}
}
}
And it will return all documents where added is greater than 2:
"hits": [
{
"_index": "users",
"_type": "user",
"_id": "1",
"_score": 1,
"_source": {
"name": "John Doe",
"added": 40
}
}
]
This time the script returned a boolean and Elasticsearch managed to use it.
If you are curious, range query can do the same job, without scripting.
Why do I have to put .value after doc['added']?
If you try to access doc['added'] directly you may notice that the error message is different:
POST /_scripts/calculate-score
{
"script": {
"lang": "painless",
"source": "doc['added'] + params.my_modifier"
}
}
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"doc['added'] + params.my_modifier",
" ^---- HERE"
],
"script": "calculate-score",
"lang": "painless",
"caused_by": {
"type": "class_cast_exception",
"reason": "Cannot apply [+] operation to types [org.elasticsearch.index.fielddata.ScriptDocValues.Longs] and [java.lang.Integer]."
}
Once again painless shows us its obscurity: when accessing the field 'added' of the document, we obtain an instance of org.elasticsearch.index.fielddata.ScriptDocValues.Longs, which Java Virtual Machine denies to add to an integer (we can't blame Java here).
So we have to actually call .getValue() method, which, translated in painless, is simply .value.
What if I want to change that field in a document?
What if you want to add 2 to field added of some document, and save the updated document? Update API can do this.
It operates in update context, which actually has got ctx variable defined, which in turn has access to the original JSON document via ctx['_source'].
We might create a new script:
POST /_scripts/add-some
{
"script": {
"lang": "painless",
"source": "ctx['_source']['added'] += params.my_modifier"
}
}
Now we can use it:
POST /users/user/1/_update
{
"script" : {
"id": "add-some",
"params" : {
"my_modifier" : 2
}
}
}
Why the example from the documentation doesn't work?
Apparently, because it is wrong. This script (from this documentation page):
POST _scripts/calculate-score
{
"script": {
"lang": "painless",
"source": "Math.log(_score * 2) + params.my_modifier"
}
}
is later executed in filter context (in a search request, in a script query), and, as we now know, there is no _score variable available.
This script would kind of make sense only in score context, when running a funtion_score query which allows to twiggle the relevance score of the documents.
Final note
I would like to mention that in general, it's recommended to avoid using scripts because their performance is poor.

Named queries in Elasticsearch

I need to search through a database with 3 keywords(3 queries), and I need to tell the user which of the keywords (query) that gave a result.
I've been looking into Named Queries as a possible solution.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-named-queries-and-filters.html
I was wondering if it is possible to apply named queries to a nested query?
According to the documentation:
The search response will include for each hit the matched_queries it
matched on.
So I tried with just one easy query to see how it works, I got a result, but no "matched_queries". Did I do something wrong?
This is my query in Kibana: (Im not using actual name)
GET database/document/_search
{
"query": {
"nested": {
"path": "first_path",
"query": {
"nested" : {
"path" : "second_path",
"query" : {
"match": {
"match_field": {
"query": "First query",
"_name" : "query"
}
}
}
}
},
"inner_hits": {}
}
}
}
From what I can see in
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-named-queries-and-filters.html
The search response will include for each hit the matched_queries it
matched on. The tagging of queries and filters only make sense for the
bool query.
It looks like you should use "bool" query in your inner-most query:
.
.
.
"query" : {
"bool": {
"should" {
…
}
}
}

Elasticsearch match all tags within given array

Currently developing a tag search application using elasticsearch, I have given each document within the index an array of tags, here's an example of how a document looks:
_source: {
title: "Keep in touch scheme",
intro: "<p>hello this is a test</p> ",
full: " <p>again this is a test mate</p>",
media: "",
link: "/training/keep-in-touch",
tags: [
"employee",
"training"
]
}
I would like to be able to make a search and only return documents with all of the specified tags.
Using the above example, if I searched for a document with tags ["employee", "training"] then the above result would be returned.
In contrast, if I searched with tags ["employee", "other"], then nothing would be returned; all tags within the search query must match.
Currently I am doing:
query: {
bool: {
must: [
{ match: { tags: ["employee","training"] }}
]
}
}
but I am just getting returned exceptions like
IllegalStateException[Can't get text on a START_ARRAY at 1:128];
I have also tried concatenating the arrays and using comma-delimited strings, however this seems to match anything given the first tag matches.
Any suggestions on how to approach this? Cheers
Option 1: Next example should work (v2.3.2):
curl -XPOST 'localhost:9200/yourIndex/yourType/_search?pretty' -d '{
"query": {
"bool": {
"must": [
{ "term": { "tags": "employee" } } ,
{ "term": { "tags": "training" } }
]
}
}
}'
Option 2: Also you can try:
curl -XPOST 'localhost:9200/yourIndex/yourType/_search?pretty' -d '{
"query": {
"filtered": {
"query": {"match_all": {}},
"filter": {
"terms": {
"tags": ["employee", "training"]
}
}
}
}
}'
But without "minimum_should_match": 1 it works little bin not accurate.
I also found "execution": "and" but it works not accurate too.
Option 3: Also you cat try query_string it works perfectly, but looks little bit complicated:
curl -XPOST 'localhost:9200/yourIndex/yourType/_search?pretty' -d '{
"query" : {
"query_string": {
"query": "(tags:employee AND tags:training)"
}
}
}'
Maybe it will be helpful for you...
To ensure that the set contains only the specified values, maintain a secondary field to keep track of the tags count. Then you can query like below to get the desired results
"query":{
"bool":{
"must":[
{"term": {"tags": "employee"}},
{"term": {"tags": "training"}},
{"term": {"tag_count": 2}}
]
}
}

Conversion from sql to elastic search query

How can i convert the following sql query into elastic search query?
SELECT sum(`price_per_unit`*`quantity`) as orders
FROM `order_demormalize`
WHERE date(`order_date`)='2014-04-15'
You need to use scripts to compute the product of values. For newer versions of Elasticsearch, enable dynamic scripting by adding the line script.disable_dynamic: false in elasticsearch.yml file. Note that this may leave a security hole in your Elasticsearch cluster. So enable scripting judiciously. Try the query below:
POST <indexname>/<typename>/_search?search_type=count
{
"query": {
"filtered": {
"filter": {
"term": {
"order_date": "2014-04-15"
}
}
}
},
"aggs": {
"orders": {
"sum": {
"script": "doc['price_per_unit'].value * doc['quantity'].value"
}
}
}
}

Lucene Multiple delete query (JSON)

I have a problem with a script that i wrote for elasticsearch. On my server I have multiple log files that need to be deleted on a daily basis. To automate this process I wrote a Perl script that deletes my keep alive log files.
Basically an curl XDELETE
But now I want to add a query to delete another log file.
IS IT POSSIBLE TO ADD ANOTHER JSON OBJECT, WITH OUT CREATING ANOTHER DELETE VARIABLE?
So adding something to my JSON that integrates a separate queries that also deletes that log?
{
"query": {
"bool": {
"must": [
{
"range": {
"#timestamp": {
"to": "2014-08-24T00:00:00.000+01:00"
}
}
},
{
"query_string": {
"fields": [
"log_message"
],
"query": "keepAlive"
}
},
]
}
}
}
(Something Like &&? adding a second bool query)
Because everything I add will just over specify the query that i have leading to results I do not want.
Thank you
Not quite sure I've correctly understood what your looking for, but it sounds like you want to combine the results of the given query with those of some other separate query. In that case, you can nest boolean queries as should clauses, something like:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"range": {
"#timestamp": {
"to": "2014-08-24T00:00:00.000+01:00"
}
}
},
{
"query_string": {
"fields": [
"log_message"
],
"query": "keepAlive"
}
},
]
}
},
{
**Another query here**
},
]
}
}
}