Nested inner hits in FreeText search use case - json

I building a standard free text search on a site that sells cars.
In the search box the user can enter a search word that are passed on to the query where it is used to match both nested and non-nested properties.
I'm using inner_hits to limit the number of variants returned by the query (in this sample variants is not remove from _source)
When matching on a nested property color the inner_hits collection contains the correct variant as expected.
However when matching on a non-nested property title the inner_hits collection is empty. I understand why it's empty.
Can you suggest a better way to structure the query?
Another option would be to always just return at least 1 variant - but how can the be achieved?
Mappings
PUT test
{
"mappings": {
"car": {
"properties": {
"variants": {
"type": "nested"
}
}
}
}
}
Insert data
PUT test/car/1
{
"title": "VW Golf",
"variants": [
{
"color": "red",
"forsale": true
},
{
"color": "blue",
"forsale": false
}
]
}
Query by color
GET test/_search
{
"query": {
"nested": {
"path": "variants",
"query": {
"match": {
"variants.color": "blue"
}
},
"inner_hits": {}
}
}
}
Color query: works as expected!
"hits" : [
{
"_source" : {
"title" : "VW Golf",
"variants" : [
{
"color" : "red",
"forsale" : true
},
{
"color" : "blue",
"forsale" : false
}
]
},
"inner_hits" : {
"variants" : {
"hits" : {
"total" : 1,
"hits" : [
{
"_nested" : {
"field" : "variants",
"offset" : 1
},
"_source" : {
"color" : "blue",
"forsale" : false
}
}
]
}
}
}
}
]
Query by brand
GET test/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"title": "golf"
}
},
{
"nested": {
"path": "variants",
"query": {
"match": {
"variants.color": "golf"
}
},
"inner_hits": {}
}
}
]
}
}
}
Brand query result :-(
"hits" : [
{
"_source" : {
"title" : "VW Golf",
"variants" : [
{
"color" : "red",
"forsale" : true
},
{
"color" : "blue",
"forsale" : false
}
]
},
"inner_hits" : {
"variants" : {
"hits" : {
"total" : 0,
"hits" : [ ]
}
}
}
}

You already know it but inner_hits returns an empty array because no nested documents matched in the nested query.
A simple solution is to change the query such that the nested query will always match. This can be done by wrapping the nested query into a bool query and add a match_all query.
If you set the boost of the match_all query to 0, it will not contribute to the score. Consequently, if a nested document match it will be first.
Now the inner hits will not be empty, but there is a second problem, all the documents will match. You can either:
set a min_score with a very small value (e.g., 0.00000001) to discard document with a score of 0
duplicate the original nested query and use a minimum_should_match at 2.
{
"query": {
"bool": {
// Ensure that at least 1 of the first 2 queries will match
// The third query will always match
"minimum_should_match": 2,
"should": [
{
"match": {
"title": <SEARCH_TERM>
}
},
{
"nested": {
"path": "variants",
"query": {
"match": {
"variants.color": <SEARCH_TERM>
}
}
}
},
{
"nested": {
"path": "variants",
"query": {
"bool": {
"should": [
{
"match": {
"variants.color": <SEARCH_TERM>
}
},
{
// Disable scoring
"match_all": { "boost": 0 }
}
]
}
},
"inner_hits": {}
}
}
]
}
}
}

One way to do it is using a script_fields clause.
You would write a little script in painless that would do the following:
store the List you get from variants in a variable
then iterate over the Maps in this List
if the Map has
color blue you return the Map . (If none evaluate to true you return an empty
Map). This would create an additional field per searchresult with only those variants where the color is blue.
One important drawback is that this is a very heavy operation, especially if you have many records.
You can take this approach if it is something only you will ever do, maybe a few times a year outside peak hours. If your use case is something with regular use and to be performed by many users, I would change the mapping, return variants as a whole or choose some other solution.

Related

storing boolean values in elasticsearch : optimization?

I have json documents with entries like :
......
{
"Fieldname" : "booked",
"Fieldvalue" : "yes"
}
...
Within the json document, there are many fields like this, where Boolean value is indirectly mentioned using Fieldname and Fieldvalue : Essentially it signifies that booked=true. Would it be more efficient to transform the json before storing it in elasticsearch ? I.e. replacing the above with :
{
"booked" : true
}
? The search use case is that I want to figure out whether similar json already exists in the system before adding another json.
Yes the later one is much cleaner way to store and search purpose both. Say you want to get all the booked properties from your index then you can easily do this way instead of using extra Fieldname and Fieldvalue
GET /properties/_search
{
"size": 10,
"query": {
"bool": {
"must": [
{
"match": {
"country_code.keyword": "US"
}
},
{
"match": {
"booked": true
}
},
{
"range": {
"usd_price": {
"gte": 50,
"lte": 100000
}
}
}
]
}
},
"sort": [
{
"ranking": {
"order": "desc"
}
}
],
"_source": [
"property_id",
"property_name",
"country",
"country_code",
"state",
"state_abbr",
"city"
]
}

Mongodb query on triple nested array of object

I'm having some problem to write a query to return a triple nested value from a document. The documents I'm using are structured like this
{
"areaname": "name1",
"places": [
{
"placename": "place1",
"objects": [
{
"objname": "obj1",
"tags": [
"tag1",
"tag2"
]
},
{
"objname": "obj2",
"tags": [
"tag6",
"tag7"
]
}
]
},
{
"placename": "place2",
"objects": [
{
"objname": "obj45",
"tags": [
"tag46",
"tag34"
]
},
{
"objname": "obj77",
"tags": [
"tag56",
"tag11"
]
}
]
}
]
}
It is quite simple actually but I can't find a solution to a simple query like:
"return the objname of the object that contains tag1 inside their tag"
So for the give document if I use "tag1" as a parameter it is expected for the query to return "obj1"
It should give me the same result if I use "tag2" as a parameter
Other example: using "tag56" it should return only "obj77"
Right now i have no problem returning the whole document using the dot-notation or top level field such as areaname or others
db.users.find( {"places.objects.tags":"tag1"}, { areaname: 1, _id:0 } )
Is this even possible?
Keeping it simple:
[
{
"$match" : {
"places.objects.tags" : "tag1"
}
},
{
"$unwind" : "$places"
},
{
"$unwind" : "$places.objects"
},
{
"$match" : {
"places.objects.tags" : "tag1"
}
},
{
"$group" : {
"_id" : "$_id",
"obj_names" : {
"$push" : "$places.objects.objname"
}
}
}
],
You should add any other fields you want to keep to the group stage,
this can also be done without the double $unwind stage but i choose this for read-ability.

Elastic query to show exact match OR other fields if not found

I need some help rewriting my elasticsearch query.
What i need is:
1- to show a single record if there is an exact match on the two fields verb and sessionid.raw (partial matches are not accepted).
"must": [
{ "match" : { "verb" : "login" } },
{ "term" : { "sessionid.raw" : strSessionID } },
]
OR
2- to show the top 5 records (sorted by _score DESC and #timestamp ASC) that match some other fields, giving a boost if the records are between the specified time range.
"must": [
{ "match" : { "verb" : "login" } },
{ "term" : { "pid" : strPID } },
],
"should": [
{ "match" : { "user.raw" : strUser } },
{ "range" : { "#timestamp" : {
"from" : QueryFrom,
"to" : QueryTo,
"format" : DateFormatElastic,
"time_zone" : "America/Sao_Paulo",
"boost" : 2 }
} },
]
The code below is almost doing what i want.
Right now it boosts sessionid.raw to the top if found, but the remaining records are not being discarded.
var objQueryy = {
"fields" : [ "#timestamp", "program", "pid", "sessionid.raw", "user", "frontendip", "frontendname", "_score" ],
"size" : ItemsPerPage,
"sort" : [ { "_score" : { "order": "desc" } }, { "#timestamp" : { "order" : "asc" } } ],
"query" : {
"bool": {
"must": [
{ "match" : { "verb" : "login" } },
{ "term" : { "pid" : strPID } },
{ "bool": {
"should": [
{ "match" : { "user.raw" : strUser } },
{ "match" : { "sessionid.raw": { "query": strSessionID, "boost" : 3 } } },
{ "range" : { "#timestamp" : { "from": QueryFrom, "to": QueryTo, "format": DateFormatElastic, "time_zone": "America/Sao_Paulo" } } },
],
}},
],
},
},
}
Elasticsearch cannot "prune" your secondary results for you when an exact match is also found.
You would have to implement this discarding functionality on the client side after all results had been returned.
You may find the cleanest implementation is to execute your two search strategies separately. Your search client would:
Run the first (exact match) query
Run the second (expanded) query only if no results found

Nested filter numerical range

I have the following json object:
{
"Title": "Terminator,
"Purchases": [
{"Country": "US", "Site": "iTunes", "Price": 4.99},
{"Country": "FR", "Site": "Google", "Price": 5.99}
]
}
I want to be able to find an object specifying a Country+Site+PriceRange. For example, the above should return True on Country=US&Price<5.00, but should return False on Country=FR&Price<5.00. How would the index and query look to do this? Here is another answer that this is a follow-up question to: Search within array object.
Simply add a Range query to your Bool query logic tree. This will return documents that match US for country and have the Price field with a numeric value less than 5.
{ "query":
{ "nested" : {
"path" : "Purchases",
"score_mode" : "avg",
"query" : {
"bool" : {
"must" : [
{
"match" : {"Purchases.Country" : "US"}
},
{
"range" : "Purchases.Price":
{
"lte": 5
}
}
]
}
}
}
}
}

elasticsearch request an element in an array

I have a document indexed in my elastic search like:
{
...
purchase:{
zones: ["FR", "GB"]
...
}
...
}
I use this kind of query to find for example document with puchase's zone to GB
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"purchase.zones": "GB"
}
}
}
}
}
But with it i get no results...
I would like to perform a query like in php in_array("GB", purchase.zones).
Any help would be very helpful.
If your "purchase" field is nested type then you have to use nested query to access the "zones".
{
"nested" : {
"path" : "obj1",
"score_mode" : "avg",
"query" : {
"bool" : {
"must" : [
{
"match" : {"obj1.name" : "blue"}
},
{
"range" : {"obj1.count" : {"gt" : 5}}
}
]
}
}
}
}
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html