Elasticsearch bucket aggregation using concatenated parameter - json

I'm using Elasticsearch API and the schema of the document as follow
{
name: "",
born_year: "",
born_month: "",
born_day: "",
book_type: "",
price: <some number>,
country: ""
}
Now what I need is to get the document count per each name where born before 1995 (born_year + born_month + born_day < "20051220"). How can i achieve?
I tried this:
{
"query": {
"query_string": {
"query": "country:\"SL\""
}
},
"size": 0,
"aggs": {
"total": {
"terms": {
"field": "name"
}
}
}
}
But I have no idea how can I add filter for the birthday.

As mentioned by #val, you need to add a real date field that you can easily add by concatenating these three fields at creation time.
But how you filter based on date range, there are two ways and both of them will return different result sets
Now the level of filtering is your choice.
You mentioned querying on country field. But you have not mentioned at what level you want to filter on date range. I will give you queries for both the cases.
Mappings- assuming you create a date field.
{
name:"",
born_year:"",
born_month:"",
born_day:"",
book_type:"",
price:<some number>,
country:"",
date : ""
}
Case - 1) Filtering date range for name aggregations only, here documents count will not be effected by the date range filter
{
"query": {
"query_string": {
"query": "country:\"SL\""
}
},
"aggs": {
"total": {
"filter": {
"range": {
"date": {
"gte": "your_date_mx",
"lte": "your_date_min"
}
}
},
"aggs": {
"NAME": {
"terms": {
"field": "name",
"size": 10
}
}
}
}
}
}
Case 2) In this case both your documents count and aggregation will be filtered for date range as we add date range filter at query level.
{
"query": {
"query_string": {
"query": "country:\"SL\""
},
"bool": {
"must": [
{
"range": {
"date": {
"gte": "your_date_mx",
"lte": "your_date_mic"
}
}
}
]
}
},
"aggs": {
"toal": {
"terms": {
"field": "name",
"size": 10
}
}
}
}
So adding a filter to aggregation will effect only aggs count.
Edit -
Approach1) with groovy script try to concatinate the string and parse it to integer and then compare with your input date.
{
"query": {
"bool": {
"must": [
{}
],
"filter": {
"script": {
"script": {
"inline": "(doc['year'].value + doc['month'].value + doc['date'].value).toInteger() > 19910701",
"params": {
"param1": 19911122
}
}
}
}
}
}
}
Make sure when indexing index date(or month) with single digit like 6 as 06
2) Approach 2 - parse the string the exact date(preferred)
{
"query": {
"bool": {
"must": [
{}
],
"filter": {
"script": {
"script": {
"inline": "Date.parse('dd-MM-yyyy',doc['date'].value +'-'+ doc['month'].value +'-'+ doc['year'].value).format('dd-MM-yyyy') > param1",
"params": {
"param1": "04-05-1991"
}
}
}
}
}
}
}
Second approach is much better approach as you don't have to worry about the maintaing the string for each field(date, month, day) to later parse to proper int for comparing.

Related

ElasticSearch multiple terms search json

I have a ton of items in a Db with many columns. I need to search across two of these columns to get one data set.
The first column, genericCode, would group together any of the rows that have that code.
The second column, genericId, is calling out a specific row to add because it is missing the list of genericCode's i'm looking for.
The back-end C# sets up my json for me as follows, but it returns nothing.
{
"from": 0,
"size": 50,
"aggs": {
"paramA": {
"nested": {
"path": "paramA"
},
"aggs": {
"names": {
"terms": {
"field": "paramA.name",
"size": 10
},
"aggs": {
"specValues": {
"terms": {
"field": "paramA.value",
"size": 10
}
}
}
}
}
}
},
"query": {
"bool": {
"must": [
{
"term": {
"locationId": {
"value": 1
}
}
},
{
"terms": {
"genericCode": [
"10A",
"20B"
]
}
},
{
"terms": {
"genericId": [
11223344
]
}
}
]
}
}
}
I get and empty result set. If I remove either of the "terms" I get what I would expect. So, I just need to combine those terms into one search.
I've gone through a lot of the documentation here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search.html
and still can't seem to find what I'm looking for.
Let's say I'm Jelly Belly and I want to create a bag of jelly beans with all the Star Wars and Disney jelly beans, and I also want to add all beans of the color green. That is basically what I'm trying to do.
EDIT: Changing the "must" to '"should"` isn't quite right either. I need it to be (in pseudo sql):
SELECT *
FROM myTable
Where locationId = 1
AND (
genericCode = "this", "that
OR
genericId = 1234, 5678
)
The locationId separates our data in an important way.
I found this post: elasticsearch bool query combine must with OR
and it has gotten me closer, but not all the way there...
I've tried several iterations of should > must > should > must building this query and get varying results, but nothing accurate.
Here is the query that is working. It helped when I realized I was passing in the wrong data for one of my parameters. doh
Nest the should inside the must as #khachik noted in the comment above. I had this some time ago but it wasn't working due to the above blunder.
{
"from": 0,
"size": 10,
"aggs": {
"specs": {
"nested": {
"path": "paramA"
},
"aggs": {
"names": {
"terms": {
"field": "paramA.name",
"size": 10
},
"aggs": {
"specValues": {
"terms": {
"field": "paramA.value",
"size": 10
}
}
}
}
}
}
},
"query": {
"bool": {
"must": [
{
"term": {
"locationId": {
"value": 1
}
}
},
{
"bool": {
"should": [
{
"terms": {
"genericCode": [
"101",
"102"
]
}
},
{
"terms": {
"genericId": [
3078711,
3119430
]
}
}
]
}
}
]
}
}
}

Do I need to merge two Elasticsearch queries or can I use an or-type operator?

I have two Elasticsearch queries (which I use via the elastic package in R).
One query gathers the number of times a feature is loaded, the other gathers the number of times a feature is unloaded.
My needs have now changed in that I need to gather both types of data/states together, in the same dataset (the state can either be TRUE or FALSE and I want to gather both in the same dataset).
What I want to do: To identify both cases where visible is either TRUE or FALSE.
Therefore, I want to know what the best approach is: should I (attempt to) merge the queries or I should use an or-type operator?
If it is the latter, how would I go about it?
For completeness, here are my minified queries (unminified versions are at the end of this question):
loads_body <- '{"size":0,"query":{"bool":{"must":[{"match":{"merchant":"a6xzTHtpQs"}},{"term":{"visible":true}},{"range":{"time":{"gte":"2018-04-02T06:00:00","lte":"2018-04-03T05:59:59","time_zone":"+00:00"}}}]}},"aggs":{"daily":{"date_histogram":{"field":"time","interval":"hour","time_zone":"+00:00","min_doc_count":0,"extended_bounds":{"min":"2018-04-02T06:00:00","max":"2018-04-03T05:59:59"}}}}}'
and
unloads_body <- '{"size":0,"query":{"bool":{"must":[{"match":{"merchant":"a6xzTHtpQs"}},{"term":{"visible":false}},{"range":{"time":{"gte":"2018-04-02T06:00:00","lte":"2018-04-03T05:59:59","time_zone":"+00:00"}}}]}},"aggs":{"daily":{"date_histogram":{"field":"time","interval":"hour","time_zone":"+00:00","min_doc_count":0,"extended_bounds":{"min":"2018-04-02T06:00:00","max":"2018-04-03T05:59:59"}}}}}'
Unminified queries:
loads_body <- '{
"size":0,
"query": {
"bool": {
"must":[ {
"match": {
"merchant": "a6xzTHtpQs"
}
}
,
{
"term": {
"visible": true
}
}
,
{
"range": {
"time": {
"gte": "2018-04-02T06:00:00", "lte": "2018-04-03T05:59:59", "time_zone": "+00:00"
}
}
}
]
}
}
,
"aggs": {
"daily": {
"date_histogram": {
"field":"time",
"interval":"hour",
"time_zone":"+00:00",
"min_doc_count":0,
"extended_bounds": {
"min": "2018-04-02T06:00:00", "max": "2018-04-03T05:59:59"
}
}
}
}
}'
and
unloads_body <- '{
"size":0,
"query": {
"bool": {
"must":[ {
"match": {
"merchant": "a6xzTHtpQs"
}
}
,
{
"term": {
"visible": false
}
}
,
{
"range": {
"time": {
"gte": "2018-04-02T06:00:00", "lte": "2018-04-03T05:59:59", "time_zone": "+00:00"
}
}
}
]
}
}
,
"aggs": {
"daily": {
"date_histogram": {
"field":"time",
"interval":"hour",
"time_zone":"+00:00",
"min_doc_count":0,
"extended_bounds": {
"min": "2018-04-02T06:00:00", "max": "2018-04-03T05:59:59"
}
}
}
}
}'
Yes you can use a single query and sub aggregations to do what you are looking for. Something along the lines of
{
"query":{
"bool":{
"must":[
{
"match":{
"merchant":"a6xzTHtpQs"
}
},
{
"range":{
"time":{
"gte":"2018-04-02T06:00:00",
"lte":"2018-04-03T05:59:59",
"time_zone":"+00:00"
}
}
}
]
}
},
"aggs":{
"Visible_agg":{
"terms":{
"field":"visible"
},
"aggs":{
"daily":{
"date_histogram":{
"field":"time",
"interval":"hour",
"time_zone":"+00:00",
"min_doc_count":0,
"extended_bounds":{
"min":"2018-04-02T06:00:00",
"max":"2018-04-03T05:59:59"
}
}
}
}
}
}
}
This should produce the histograms in two buckets one for "visible": true and other for "visible":false
Is this what you are looking for?

Query elasticsearch match return inexact document

I'm trying to retrieve random documents that contain #maga so I did the following query:
{
"_source": "text",
"query": {
"function_score": {
"query": {
"match": {
"text": "#maga"
}
},
"functions": [
{
"random_score": {}
}
]
}
}
}
The problem is some returned document doesn't contain #maga but just the token maga. Why so? And How can I ivercome this problem?

How to perform AND condition in elasticsearch query?

I have the following query where I want to query the indexname for ID "abc_12-def that fall within the date range specified in the range filter.
But the below query is fetching values of different ID as well(for eg: abc_12-edf, abc_12-pgf etc) and that fall outside the date range. Any advice on how I can give an AND condition here? Thanks.
curl -XPOST 'localhost:9200/indexname/status/_search?pretty=1&size=1000000' -d '{
"query": {
"filtered" : {
"filter": [
{ "term": { "ID": "abc_12-def" }},
{ "range": { "Date": { "gte": "2015-10-01T09:12:11", "lte" : "2015-11-18T10:10:13" }}}
]
}
}
}'
You need to use Bool query for AND aka MUST condition
{
"query": {
"bool": {
"must": [
{
"term": {
"ID": "abc_12-def"
}
},
{
"range": {
"Date": {
"gte": "2015-10-01T09:12:11",
"lte": "2015-11-18T10:10:13"
}
}
}
]
}
}
}
Also, all fields by default are analyzed using standard analyzer, which means abc_12-def is tokenized as [abc_12, def]. term query does not analyze the string.
If you are looking for an exact match, you should mark the field as not_analyzed. How to map it as not_analyzed is explained here.

Elasticsearch the terms filter raise "filter does not support [mediatest]"

my query is like this:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"online": 1
}
},
{
"terms": {
"mediaType": "flash"
}
}
]
}
}
}
}
}
it raise a QueryParsingException [[comos_v2] [terms] filter does not support [mediaType]],of which the field "mediaType" exactly does not exist in mapping.
my question is why term filter does not raise the Exception?
The above is not a valid Query DSL. In the above Terms filter the values to "mediaType" field should be an array
It should be the following :
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"online": 1
}
},
{
"terms": {
"mediaType": ["flash"]
}
}
]
}
}
}
}
}
Its 2021 I'm using .keyword for an exact text match but you can just as easily omit:
{"query":
{"bool":
{"must":
[
{"term":
{"variable1.keyword":var1Here}
},
{"term":
{"variable2.keyword":var2Here}
}
]
}
}
}
Its simply a matter of "term" vs "terms". Very easy to miss the plural / single aspect of it.
I had a very similar error with this query, in which I was trying to delete a specific zone:
'{"query":{"terms":{"zoneid":25070}}}'
I was getting an error when I ran the above query.
As soon as changed "terms" to "term" the query executed with no issues, like this:
'{"query":{"term":{"zoneid":25070}}}'