Unknown key for a START_OBJECT in a multiple aggregations elasticsearch - json

I'm trying to build a query allowing me to make multiple aggregations (on the same level, not sub aggregations) on a single query. Here's the request I'm sending :
{
"index": "index20",
"type": "arret",
"body": {
"size": 0,
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "anim fore",
"analyzer": "query_analyzer",
"type": "cross_fields",
"fields": [
"doc_id"
],
"operator": "and"
}
}
]
}
},
"aggs": {
"anim_fore": {
"terms": {
"field": "suggest_keywords.autocomplete",
"order": {
"_count": "desc"
},
"include": {
"pattern": "anim.*fore.*"
}
}
},
"fore": {
"terms": {
"field": "suggest_keywords.autocomplete",
"order": {
"_count": "desc"
},
"include": {
"pattern": "fore.*"
}
}
}
}
}
}
However, I'm getting the following error when executing this query :
Error: [parsing_exception] Unknown key for a START_OBJECT in [fore]., with { line=1 & col=1351 }
I've been trying to change this query in many forms to make it works but I always end up with this error. It seems really strange to me as this query seems compatible with the format specified there : ES documentation.
Maybe there is something specific about terms aggregations but I haven't been able to sort it out.

The error is in your include settings, which should simply be strings
"aggs": {
"anim_fore": {
"terms": {
"field": "suggest_keywords.autocomplete",
"order": {
"_count": "desc"
},
"include": "anim.*fore.*" <--- here
}
},
"fore": {
"terms": {
"field": "suggest_keywords.autocomplete",
"order": {
"_count": "desc"
},
"include": "fore.*" <--- and here
}
}
}

You have trailing commas after doc_id and after closing array tag for must, your query should look like this
"must": [
{
"multi_match": {
"query": "anim fore",
"analyzer": "query_analyzer",
"type": "cross_fields",
"fields": [
"doc_id" // You have trailing comma here
],
"operator": "and"
}
}
] // And here

Related

How to round up values in Elasticsearch query?

I am trying to set up an automated Kibana alert that takes in data from a defined extraction query. I get all the information I want, however, the response query returns values without rounding them up (up to 12 decimal points). Where in the extraction query and what do I specify to round this value up?
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"match_all": {
"boost": 1
}
},
{
"range": {
"#timestamp": {
"from": "{{period_end}}||-24h",
"to": "{{period_end}}",
"include_lower": true,
"include_upper": true,
"format": "epoch_millis",
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"_source": {
"includes": [],
"excludes": []
},
"stored_fields": "*",
"docvalue_fields": [
{
"field": "#timestamp",
"format": "date_time"
},
{
"field": "timestamp",
"format": "date_time"
}
],
"script_fields": {},
"aggregations": {
"2": {
"terms": {
"field": "tag.country.keyword",
"size": 20,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"1": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"1": {
"avg": {
"field": "my_field"
}
}
}
}
}
}
Here, I'm talking about the "avg" aggregation at the very bottom. As I understand, right below the "field" key, I should specify a "script" key, defining the rounding function that I want to use. Can anybody help me come up with the correct function?
I'm not sure what to specify in the "script" key to make the rounding function work.

Error in Term Parsing in Elastic search question

I have the following query:
{
"aggs": {
"groupby": {
"terms": {
"field": "AMAZING LONG NAME THAT MAKES NO SENSE",
"missing": "",
"order": [
{
"_term": "asc"
}
],
"size": 10038
}
}
},
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"term": {
"match": {
"AMAZING LONG NAME THAT MAKES NO SENSE": "Term1"
}
}
}
]
}
}
]
}
},
"size": 10
}
And it raises a parsing_exception
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[term] query does not support [AMAZING LONG NAME THAT MAKES NO SENSE]",
"line": 1,
"col": 235
}
],
"type": "x_content_parse_exception",
"reason": "[1:235] [bool] failed to parse field [filter]",
"caused_by": {
"type": "x_content_parse_exception",
"reason": "[1:235] [bool] failed to parse field [must]",
"caused_by": {
"type": "parsing_exception",
"reason": "[term] query does not support [AMAZING LONG NAME THAT MAKES NO SENSE]",
"line": 1,
"col": 235
}
}
},
"status": 400
}
My question is should it be the field name that is to be entered in match?
The Term query syntax can be corrected as belwo :
POST demoindex/_search
{
"aggs": {
"groupby": {
"terms": {
"field": "AMAZING LONG NAME THAT MAKES NO SENSE",
"missing": "",
"order": [
{
"_term": "asc"
}
],
"size": 10038
}
}
},
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"term": {
"AMAZING LONG NAME THAT MAKES NO SENSE": {
"value": "Term1"
}
}
}
]
}
}
]
}
},
"size": 10
}
Term query syntax is as belwo:
query -> term -> fieldname(to perform exact match on)--> value

must match URL address returning a lot of documents - Elasticsearch

I'm simply trying to check how many documents have the same link value. There is something weird going on.
Let's say one or more documents has this link value: https://twitter.com/someUser/status/1288024417990144000
I search for it using this JSON query:
/theIndex/_doc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"link": "https://twitter.com/someUser/status/1288024417990144000"
}
}
]
}
}
}
It returns documents 522 of 546, with the first document being the correct one. It acts more like a query_string than a must match
If I search another more unique field like sha256sum:
{
"query": {
"bool": {
"must": [
{
"match": {
"sha256sum": "dad06b7a0a68a0eb879eaea6e4024ac7f97e38e6ac2b191afa7c363948270303"
}
}
]
}
}
}
It returns 1 document like it should.
I've tried searching must term aswell, but it returns 0 documents.
Mapping
{
"images": {
"aliases": {},
"mappings": {
"properties": {
"sha256sum": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"link": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
}
},
"settings": {
"index": {
"number_of_shards": "1",
"provided_name": "images",
"creation_date": "1593711063075",
"analysis": {
"filter": {
"synonym": {
"ignore_case": "true",
"type": "synonym",
"synonyms_path": "synonyms.txt"
}
},
"analyzer": {
"synonym": {
"filter": [
"synonym"
],
"tokenizer": "keyword"
}
}
},
"number_of_replicas": "1",
"uuid": "a5zMwAYCQuW6U4R8POiaDw",
"version": {
"created": "7050199"
}
}
}
}
}
I wouldn't think such a simple issue would be so hard to fix. Am I just missing something right in front of my eyes?
Does anyone know what might be going on here?
Even though I don't see the link field in your mapping (is it source?), I suspect it is a text field and text fields are analyzed. If you want to perform an exact match, you need to match on the link.keyword field and it's going to behave like you expect:
{
"query": {
"bool": {
"must": [
{
"match": {
"link.keyword": "https://twitter.com/someUser/status/1288024417990144000"
^
|
add this
}
}
]
}
}
}

How to query an elasticsearch aggregation with a term and sum on different nested objects?

I have the following object whose value attribute is a nested object type:
{
"metadata": {
"tenant": "home",
"timestamp": "2016-03-24T23:59:38Z"
},
"value": {
{ "key": "foo", "int_value": 100 },
{ "key": "bar", "str_value": "taco" }
}
}
This type of object has the following mapping:
{
"my_index": {
"mappings": {
"my_doctype": {
"properties": {
"metadata": {
"properties": {
"tenant": {
"type": "string",
"index": "not_analyzed"
},
"timestamp": {
"type": "date",
"format": "dateOptionalTime"
}
}
},
"value": {
"type": "nested",
"properties": {
"str_value": {
"type": "string",
"index": "not_analyzed"
},
"int_value": {
"type": "long"
},
"key": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
With this setup, I would like to perform an aggregation that performs the following result:
Perform a term aggregation on the str_value attribute of objects where the key is set to "bar"
In each bucket created from the above aggregation, calculate the sum of the int_value attributes where the key is set to "foo"
Have the results laid out in a date_histogram for a given time range.
With this goal in mind, I have been able to get the term and date_histogram aggregations to work on my nested objects, but have not had luck performing the second level of calculation. Here is the current query I am attempting to get working:
{
"query": {
"match_all": {}
},
"aggs": {
"filters": {
"filter": {
"bool": {
"must": [
{
"term": {
"metadata.org": "gw"
}
},
{
"range": {
"metadata.timestamp": {
"gte": "2016-03-24T00:00:00.000Z",
"lte": "2016-03-24T23:59:59.999Z"
}
}
}
]
}
},
"aggs": {
"intervals": {
"date_histogram": {
"field": "metadata.timestamp",
"interval": "1d",
"min_doc_count": 0,
"extended_bounds": {
"min": "2016-03-24T00:00:00Z",
"max": "2016-03-24T23:59:59Z"
},
"format": "yyyy-MM-dd'T'HH:mm:ss'Z'"
},
"aggs": {
"nested_type": {
"nested": {
"path": "value"
},
"aggs": {
"key_filter": {
"filter": {
"term": {
"value.key": "bar"
}
},
"aggs": {
"groupBy": {
"terms": {
"field": "value.str_value"
},
"aggs": {
"other_nested": {
"reverse_nested": {
"path": "value"
},
"aggs": {
"key_filter": {
"filter": {
"term": {
"value.key": "foo"
}
},
"aggs": {
"amount_sum": {
"sum": {
"field": "value.int_value"
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
The result I am expecting to receive from Elasticsearch would look like the following:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 7,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"filters": {
"doc_count": 2,
"intervals": {
"buckets": [
{
"key_as_string": "2016-03-24T00:00:00Z",
"key": 1458777600000,
"doc_count": 2,
"nested_type": {
"doc_count": 5,
"key_filter": {
"doc_count": 2,
"groupBy": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "taco",
"doc_count": 1,
"other_nested": {
"doc_count": 1,
"key_filter": {
"doc_count": 1,
"amount_sum": {
"value": 100.0
}
}
}
}
]
}
}
}
}
]
}
}
}
}
However, the innermost object (...groupBy.buckets.key_filter.amount_sum) is having its value return 0.0 instead of 100.0.
I think this is due to the fact that nested objects are indexed as separate documents, so filtering by one key attribute's value is not allowing me to query to against another key.
Would anyone have any idea on how to get this type of query to work?
For a bit more context, the reason for this document structure is because I do not control the content of the JSON documents that get indexed, so different tenants may have conflicting key names with different values (e.g. {"tenant": "abc", "value": {"foo": "a"} } vs. {"tenant": "xyz", "value": {"foo": 1} }. The method I am trying to use is the one laid out by this Elasticsearch Blog Post, where it recommends to transform objects that you don't control into a structure that you do and to use nested objects to help with this (specifically the Nested fields for each data type section of the article). I would also be open to learn of a better way to handle this situation of not controlling the document's JSON structure if there is one so that I can perform aggregations.
Thank you!
EDIT: I am using Elasticsearch 1.5.
Solved this situation by utilizing the reverse_nested aggregation in the correct way as described here: http://www.shayne.me/blog/2015/2015-05-18-elasticsearch-nested-docs/

filter '_index' same way as '_type' in search across multiple index query elastic search

I have two indexes index1 and index2 and both has two types type1 and type2 with same name in elastic search.(please assume that we have valid business reason behind it)
I would like to search index1 - type1 and index2 -type2
here is my query
POST _search
{
"query": {
"indices": {
"indices": ["index1","index2"],
"query": {
"filtered":{
"query":{
"multi_match": {
"query": "test",
"type": "cross_fields",
"fields": ["_all"]
}
},
"filter":{
"or":{
"filters":[
{
"terms":{
"_index":["index1"], // how can i make this work?
"_type": ["type1"]
}
},
{
"terms":{
"_index":["index2"], // how can i make this work?
"_type": ["type2"]
}
}
]
}
}
}
},
"no_match_query":"none"
}
}
}
You can use the indices, type in a bool filter to filter on type and index
The query would look something on these lines :
POST index1,index2/_search
{
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "test",
"type": "cross_fields",
"fields": [
"_all"
]
}
},
"filter": {
"bool": {
"should": [
{
"indices": {
"index": "index1",
"filter": {
"type": {
"value": "type1"
}
},
"no_match_filter": "none"
}
},
{
"indices": {
"index": "index2",
"filter": {
"type": {
"value": "type2"
}
},
"no_match_filter": "none"
}
}
]
}
}
}
}
}
Passing the index names in the url example : index1,index2/_search is a good practice else you risk executing query across all indices in the cluster.