Elastic Nested query - display only the first 2 inner hits - json

How do I change my query to display only the 5 first orders within the orderbook?
My data is structure like this. Order is a nested type.
Orderbook
|_ Orders
This is my query
GET /orderindex/_search
{
"size": 10,
"query": {
"term": { "_type": "orderbook" }
}
}
And this is the result
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 10,
"max_score": 1,
"hits": [
{
"_index": "orderindex",
"_type": "orderbook",
"_id": "1",
"_score": 1,
"_source": {
"id": 1,
"exchange": "Exch1",
"label": "USD/CAD",
"length": 40,
"timestamp": "5/16/2018 4:33:31 AM",
"orders": [
{
"pair1": "USD",
"total": 0.00183244,
"quantity": 61,
"orderbookId": 0,
"price": 0.00003004,
"exchange": "Exch1",
"id": 5063,
"label": "USD/CAD",
"pair2": "CAD",
},
{
"pair1": "USD",
"total": 0.0231154,
"quantity": 770,
"orderbookId": 0,
"price": 0.00003002,
"exchange": "Exch1",
"id": 5064,
"label": "USD/CAD",
"pair2": "CAD",
},
...
..
.
Also, how do I make to query two specific orderbooks by its label name and retrieve only the first 2 orders?
I am now sending of this query, but the problem is it is returning the orderbooks including all its orders and then after this it returns only 2 plus inners hits. How Do I do to return only the 2 inner hits without all the orders that are coming with the orderbook from the first part of the query
GET /orderindex/_search
{
"query": {
"bool": {
"must": [
{
"term": { "_type": "orderbook" }
},
{
"nested": {
"path": "orders",
"query": {
"match_all": {}
},
"inner_hits": {
"size": 2
}
}
}
]
}
}}

Inner hits support the following options:
size
The maximum number of hits to return per inner_hits. By default the
top three matching hits are returned.
Which basically means, that you could do that, by using similar query
{
"query": {
"bool": {
"must": [
{
#replace this one with your query for orderbook
"term": {
"user": "kimchy"
}
},
{
"nested": {
"path": "orders",
"query": {
"match_all": {}
},
"inner_hits": {
"size": 3 #we asks for only 3 inner hits
}
}
}
]
}
}
}
One could also would like to filter _source from results by doing this:
"_source": {
"includes": [ "obj1.*", "obj2.*" ],
"excludes": [ "*.description" ]
}
In your case of orders - it could be useful to excludes orders.*
More information on this one - https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-inner-hits.html

Related

How to match on multiple fields per array item in elastic search

I am trying to create an elastic search query to match multiple fields inside of an object inside of an array.
For example, the Elastic Search structure I am querying against is similar to the following:
"hits": [
{
"_index": "titles",
"_type": "title",
...
"_source": {
...
"genres": [
{
"code": "adventure",
"priority": 1
},
{
"code": "action",
"priority": 2
},
{
"code": "horror",
"priority": 3
}
],
...
},
...
]
And what I am trying to do is match on titles with specific genre/priority pairings. For example, I am trying to match all titles with code=action and priority=1, but my query is returning too many results. The above title is hit during this example due to the fact that the genre list contains both a genre with code=action AND another genre that matches priority=1. My query is similar to the following:
"query": {
"bool": {
"filter": [
{
"bool": {
"must":[
{"term": {
"genres.code": {
"value": "action",
"boost": 1.0
}
}},
{"term": {
"genres.priority": {
"value": 1,
"boost": 1.0
}
}}
]
}
},
...
}
Is there any way to form the query in order to match a title with a single genre containing both priority=1 AND code=action?
I have recreated your problem. I added the following mapping
PUT titles
{
"mappings": {
"title": {
"properties": {
"author": {
"type": "text"
},
"genres": {
"type": "nested"
}
}
}
}
}
Then I added values to the index. This was what was inserted
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "titles",
"_type": "title",
"_id": "2",
"_score": 1,
"_source": {
"author": "Author 1",
"genres": [
{
"code": "adventure",
"priority": 2
},
{
"code": "action",
"priority": 3
},
{
"code": "horror",
"priority": 1
}
]
}
},
{
"_index": "titles",
"_type": "title",
"_id": "1",
"_score": 1,
"_source": {
"author": "Author 2",
"genres": [
{
"code": "adventure",
"priority": 3
},
{
"code": "action",
"priority": 1
},
{
"code": "horror",
"priority": 2
}
]
}
},
{
"_index": "titles",
"_type": "title",
"_id": "3",
"_score": 1,
"_source": {
"author": "Author 3",
"genres": [
{
"code": "adventure",
"priority": 3
},
{
"code": "action",
"priority": 1
},
{
"code": "horror",
"priority": 2
}
]
}
}
]
}
My query is:
GET titles/title/_search
{
"query": {
"nested": {
"path": "genres",
"query": {
"bool": {
"must": [
{
"term": {
"genres.code": {
"value": "horror"
}
}
},
{
"term": {
"genres.priority": {
"value": 1
}
}
}
]
}
}
}
}
}
The query returns
"_source": {
"author": "Author 1",
"genres": [
{
"code": "adventure",
"priority": 2
},
{
"code": "action",
"priority": 3
},
{
"code": "horror",
"priority": 1
}
]
}
This title is the only one that has code = 'horror' and priority = 1.

hide time_out, _shards, max_score and other default fields from output

When i perform any search query in elastic search i get output with these fields added automatically,
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 17,
"successful": 17,
"failed": 0
},
"hits": {
"total": 122,
"max_score": 10.268,
"hits": [
{
"_index": "imdb",
"_type": "txt",
"_id": "f4c8929735ad",
"_score": 11.775636,
my desired fields are everything under _source
"_source": {
"actor_name": {
"attribute": "value2"
},
"age_data":{
"perm": 29
}
}
How can I filter out everything from displaying at the output except _source in elasticsearch?
You could use filter path functionality to filter only what you want to have. Something like this should help:
host:port/_search?pretty&filter_path=hits.hits._source
In this case you would have response like this:
{
"hits": {
"hits": [
{
"_source": {
"type": "type",
"date": "2018-05-10T16:54:54.162Z"
}
},
{
"_source": {
"type": "type",
"date": "2018-05-14T10:39:15.903Z"
}
}
]
}
}
More information on filter path - https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#common-options-response-filtering

Elasticsearch aggregations filtered result is not working properly

two sample documents
POST /aggstest/test/1
{
"categories": [
{
"type": "book",
"words": [
{"word":"storm","count":277},
{"word":"pooh","count":229}
]
},
{
"type": "magazine",
"words": [
{"word":"vibe","count":100},
{"word":"sunny","count":50}
]
}
]
}
POST /aggstest/test/2
{
"categories": [
{
"type": "book",
"words": [
{"word":"rain","count":160},
{"word":"jurassic park","count":150}
]
},
{
"type": "megazine",
"words": [
{"word":"tech","count":200},
{"word":"homes","count":30}
]
}
]
}
aggs query
GET /aggstest/test/_search
{
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"categories.type": "book"
}
},
{
"term": {
"categories.words.word": "storm"
}
}
]
}
}
}
},
"aggs": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"categories.type": "book"
}
}
]
}
},
"aggs": {
"book_category": {
"terms": {
"field": "categories.words.word",
"size": 10
}
}
}
}
},
"post_filter": {
"term": {
"categories.type": "book"
}
}
}
result
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"filtered": {
"doc_count": 1,
"book_category": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "pooh",
"doc_count": 1
},
{
"key": "storm",
"doc_count": 1
},
{
"key": "sunny",
"doc_count": 1
},
{
"key": "vibe",
"doc_count": 1
}
]
}
}
}
}
========================
Expected aggs result set should not include "sunny" and "vibe" because it's "magazine" type.
I used filter query and post_filter, but I couldn't get only "book" type aggs result.
All the filters you apply (in-query and in-aggregation) still return the whole categories document. And this document, which contains all 4 words, is a scope for aggregation. Hence you always get all 4 buckets.
As far as I understand, some way to manipulate buckets on server-side would be introduced with reducers in version 2.0 of Elasticsearch.
What you may use now is changing the mapping so that categories is nested object. Hence you'll be able to query them independently and aggregate accordingly using nested aggregation. Changing object type to nested requires reindexing.
Also please note that post-filters are not applied to aggregation whatsoever. They are used to filter the original query without affecting the aggregation when you need to aggregate on wider scope than returned hits.
And one more thing, if you already have filter in query there's no need to put it in aggregation, scope is already filtered.

Elasticsearch combined query and filter not giving correct resutls

I'm trying to make a search page with extra filter items, but i can't get my query to work how i want it.
Here's the query example:
{
"size": 25,
"from": 0,
"sort": {
"_score": {
"order": "asc"
}
},
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"year": "2015"
}
}
]
}
},
"query": {
"match": {
"title": "Sense"
}
}
}
}
}
i want only results that are from 2015. Searching for title 'Sense' comes up with nothing, even though there is a row with the title 'Sense8'. If i search for Sense8, it returns the correct data, but not 'Sense'.
What am i doing wrong?
Thanks
You probably need to use an ngram or edge ngram analyzer in your mapping. I wrote a blog post about using ngrams for autocomplete on the Qbox blog that goes through it some detail, but here is some code that might give you what you want:
PUT /test_index
{
"settings": {
"analysis": {
"filter": {
"ngram_filter": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 20,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"ngram_filter"
]
},
"whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"year":{
"type": "string"
},
"title":{
"type": "string",
"index_analyzer": "ngram_analyzer",
"search_analyzer": "whitespace_analyzer"
}
}
}
}
}
POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"year": "2015","title":"Sense8"}
{"index":{"_index":"test_index","_type":"doc","_id":2}}
{"year": "2014","title":"Something else"}
POST /test_index/_search
{
"size": 25,
"from": 0,
"sort": {
"_score": {
"order": "asc"
}
},
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"year": "2015"
}
}
]
}
},
"query": {
"match": {
"title": "Sense"
}
}
}
}
}
...
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": null,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.30685282,
"_source": {
"year": "2015",
"title": "Sense8"
},
"sort": [
0.30685282
]
}
]
}
}
You can run the code in your browser here:
http://sense.qbox.io/gist/4f72c182db2017ac7d32077af16cbc3528cb79f0

Returning term count for a single document using the terms facet in elastic search

Say I have the following search query...
POST /topics/_search
{
"fields": [
"topic_attachment",
"topic_replies",
"topic_status"
],
"query" : {
"filtered" : {
"query" : {
"term" : {
"_id" : "5478"
}
}
}
},
"facets": {
"text": {
"terms": {
"field": "text",
"size": 10,
"order": "count"
}
}
}
}
The result of this search is the following.
{
"took": 93,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "topics",
"_type": "full-topic",
"_id": "5478",
"_score": 1,
"fields": {
"topic_replies": 1141,
"topic_status": 0,
"topic_attachment": false
}
}
]
},
"facets": {
"text": {
"_type": "terms",
"missing": 0,
"total": 8058,
"other": 8048,
"terms": [
{
"term": "ω",
"count": 1
},
{
"term": "œyouâ",
"count": 1
},
{
"term": "œyou",
"count": 1
},
{
"term": "œwhisperedâ",
"count": 1
},
{
"term": "œwalt",
"count": 1
},
{
"term": "œunderstandingâ",
"count": 1
},
{
"term": "œtieâ",
"count": 1
},
{
"term": "œthe",
"count": 1
},
{
"term": "œpersonally",
"count": 1
},
{
"term": "œnappiesâ",
"count": 1
}
]
}
}
}
Each term has a count of exactly 1. Why is this? I know the text from this document has more than one term in common. Is this because the term count only increments once per document? If so how do I count a term more than once from a single document?
That's the document count, not the term frequency. Luckily with the new aggregations module (replacement for facets introduced in 1.0.Beta2) count has been renamed to doc_count to clarify what it is.