ElasticSearch - Combining query match with wildcard - json

I'm fairly new to ElasticSearch still, but I'm currently trying to wrap my head around why I am not able to mix a wildcard query with a match as well.
Take this JSON body for example
{
"size":"10",
"from":0,
"index":"example",
"type":"logs",
"body":{
"query":{
"match":{
"account":"1234"
},
"wildcard":{
"_all":"*test*"
}
},
"sort":{
"timestamp":{
"order":"desc"
}
}
}
}
It returns with the error "SearchPhaseExecutionException[Failed to execute phase [query], all shards failed;"
(Full dump: http://pastebin.com/uJJZm8fQ)
However, if I remove either the wildcard or match key from the request body - it returns results as expected.
I've been going through the documentation and I'm not really able to find any relevant content at all.
At first I thought it was to do with the _all parameter, but even if I explicitly specify a key, the same result occurs.
Before I assume that I should be using the 'bool' operator, or something alike to mix my query types, is there any explanation for this?

The exception says that it does not understand the field "index". When querying Elasticsearch you include the index name and type in the URL. There is no wildcard search in a match query. There is a wildcard search in the query_string query.
Your query should be something like this with match:
POST /example/logs/_search
{
"size": 10,
"from": 0,
"query" : {
"match": {
"account": "1234"
}
},
"sort": {
"timestamp" : {
"order": "desc"
}
}
Or something like this with query_string:
POST /example/logs/_search
{
"size": 10,
"from": 0,
"query" : {
"query_string": {
"default_field": "account",
"query": "*1234*"
}
},
"sort": {
"timestamp" : {
"order": "desc"
}
}
EDIT: Adding an example of a wildcard query:
POST /example/logs/_search
{
"size": 10,
"from": 0,
"query" : {
"wildcard": "*test*"
},
"sort": {
"timestamp" : {
"order": "desc"
}
}

Related

Elasticsearch exact phrase match on JSON

I am working on exact phrase match from a json field using the elasticsearch. I have tried mutiple syntax like multi_match, query_string & simple_query_string but they does not return results exactly as per the given phrase.
query_string syntax that I am using;
"query":{
"query_string":{
"fields":[
"json.*"
],
"query":"\"legal advisor\"",
"default_operator":"OR"
}
}
}
I also tried filter instead of query but filter is not given any result on json. The syntax I used for filter is;
"query": {
"bool": {
"filter": {
"match": {
"json": "legal advisor"
}
}
}
}
}
Now the question is;
Is it possible to perform exact match operation on json using elasticsearch?
You can try using multi-match query with type phrase
{
"query": {
"multi_match": {
"query": "legal advisor",
"fields": [
"json.*"
],
"type": "phrase"
}
}
}
Since you have not provided your sample docs and expected docs, I am assuming you are looking for a phrase match, Adding a working sample.
Index sample docs which will also generate the index mapping
{
"title" : "legal advisor"
}
{
"title" : "legal expert advisor"
}
Now if you are looking for exact phrase search of legal advisor use below query
{
"query": {
"match_phrase": {
"title": "legal advisor"
}
}
}
Which returns only first doc
"hits": [
{
"_index": "64989158",
"_type": "_doc",
"_id": "1",
"_score": 0.5753642,
"_source": {
"title": "legal advisor"
}
}
]

storing boolean values in elasticsearch : optimization?

I have json documents with entries like :
......
{
"Fieldname" : "booked",
"Fieldvalue" : "yes"
}
...
Within the json document, there are many fields like this, where Boolean value is indirectly mentioned using Fieldname and Fieldvalue : Essentially it signifies that booked=true. Would it be more efficient to transform the json before storing it in elasticsearch ? I.e. replacing the above with :
{
"booked" : true
}
? The search use case is that I want to figure out whether similar json already exists in the system before adding another json.
Yes the later one is much cleaner way to store and search purpose both. Say you want to get all the booked properties from your index then you can easily do this way instead of using extra Fieldname and Fieldvalue
GET /properties/_search
{
"size": 10,
"query": {
"bool": {
"must": [
{
"match": {
"country_code.keyword": "US"
}
},
{
"match": {
"booked": true
}
},
{
"range": {
"usd_price": {
"gte": 50,
"lte": 100000
}
}
}
]
}
},
"sort": [
{
"ranking": {
"order": "desc"
}
}
],
"_source": [
"property_id",
"property_name",
"country",
"country_code",
"state",
"state_abbr",
"city"
]
}

Nested inner hits in FreeText search use case

I building a standard free text search on a site that sells cars.
In the search box the user can enter a search word that are passed on to the query where it is used to match both nested and non-nested properties.
I'm using inner_hits to limit the number of variants returned by the query (in this sample variants is not remove from _source)
When matching on a nested property color the inner_hits collection contains the correct variant as expected.
However when matching on a non-nested property title the inner_hits collection is empty. I understand why it's empty.
Can you suggest a better way to structure the query?
Another option would be to always just return at least 1 variant - but how can the be achieved?
Mappings
PUT test
{
"mappings": {
"car": {
"properties": {
"variants": {
"type": "nested"
}
}
}
}
}
Insert data
PUT test/car/1
{
"title": "VW Golf",
"variants": [
{
"color": "red",
"forsale": true
},
{
"color": "blue",
"forsale": false
}
]
}
Query by color
GET test/_search
{
"query": {
"nested": {
"path": "variants",
"query": {
"match": {
"variants.color": "blue"
}
},
"inner_hits": {}
}
}
}
Color query: works as expected!
"hits" : [
{
"_source" : {
"title" : "VW Golf",
"variants" : [
{
"color" : "red",
"forsale" : true
},
{
"color" : "blue",
"forsale" : false
}
]
},
"inner_hits" : {
"variants" : {
"hits" : {
"total" : 1,
"hits" : [
{
"_nested" : {
"field" : "variants",
"offset" : 1
},
"_source" : {
"color" : "blue",
"forsale" : false
}
}
]
}
}
}
}
]
Query by brand
GET test/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"title": "golf"
}
},
{
"nested": {
"path": "variants",
"query": {
"match": {
"variants.color": "golf"
}
},
"inner_hits": {}
}
}
]
}
}
}
Brand query result :-(
"hits" : [
{
"_source" : {
"title" : "VW Golf",
"variants" : [
{
"color" : "red",
"forsale" : true
},
{
"color" : "blue",
"forsale" : false
}
]
},
"inner_hits" : {
"variants" : {
"hits" : {
"total" : 0,
"hits" : [ ]
}
}
}
}
You already know it but inner_hits returns an empty array because no nested documents matched in the nested query.
A simple solution is to change the query such that the nested query will always match. This can be done by wrapping the nested query into a bool query and add a match_all query.
If you set the boost of the match_all query to 0, it will not contribute to the score. Consequently, if a nested document match it will be first.
Now the inner hits will not be empty, but there is a second problem, all the documents will match. You can either:
set a min_score with a very small value (e.g., 0.00000001) to discard document with a score of 0
duplicate the original nested query and use a minimum_should_match at 2.
{
"query": {
"bool": {
// Ensure that at least 1 of the first 2 queries will match
// The third query will always match
"minimum_should_match": 2,
"should": [
{
"match": {
"title": <SEARCH_TERM>
}
},
{
"nested": {
"path": "variants",
"query": {
"match": {
"variants.color": <SEARCH_TERM>
}
}
}
},
{
"nested": {
"path": "variants",
"query": {
"bool": {
"should": [
{
"match": {
"variants.color": <SEARCH_TERM>
}
},
{
// Disable scoring
"match_all": { "boost": 0 }
}
]
}
},
"inner_hits": {}
}
}
]
}
}
}
One way to do it is using a script_fields clause.
You would write a little script in painless that would do the following:
store the List you get from variants in a variable
then iterate over the Maps in this List
if the Map has
color blue you return the Map . (If none evaluate to true you return an empty
Map). This would create an additional field per searchresult with only those variants where the color is blue.
One important drawback is that this is a very heavy operation, especially if you have many records.
You can take this approach if it is something only you will ever do, maybe a few times a year outside peak hours. If your use case is something with regular use and to be performed by many users, I would change the mapping, return variants as a whole or choose some other solution.

How to perform partial matching on _id in Elastic search

I am trying to perform a partial word matching on the _id field in my Elastic search instance.
After searching the official documentation I found out that the best way to do this is to create a n-gram analyzer, so using Sense I did this:
PUT /index2
{"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"partial_filter": {
"type": "ngram",
"min_gram": 2,
"max_gram": 20
}
},
"analyzer": {
"partial": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"partial_filter"
]
}
}
}
}}
I have tried to test the analyzer using :
POST /index2/_analyze
{
"analyzer": "partial",
"text": "brown fox"
}
And it worked as expected producing proper combinations.
The next step should be to apply the analyzer to the relevant fields,so I tried to do this:
PUT /index2/_mapping/type2
{
"type2": {
"properties": {
"_id": {
"type": "string",
"analyzer": "partial"
}
}
}
}
But i am getting an error:
"reason": "Field [_id] is defined twice in [type2]"
Probably because _id field gets created during the index2 creation along with the analyzer.
So my question is how can I use the partial search on the _id field?
Is there any other way to do this?
Thanks in advance!

Sort or Order by specific values with Elasticsearch 5

Trying to figure out how to sort elasticsearch results so that fields with specific values always show first. In this case, I want specific SKUs to show first when showing category pages (I'm using bool query to generate elasticsearch results for category pages).
If I were trying to accomplish this with MySQL, I'd use the case statement:
ORDER BY CASE sku
WHEN 'sku1' then 1 WHEN 'sku2' then 2 WHEN 'sku3' then 3 ELSE 4 END
This query executes:
{
"sort" : [
{
"_script": {
"type": "number",
"script": {
"inline" : "params.sortOrder.indexOf(doc['skuid_text'].value)",
"params": {
"sortOrder": [
"SKUID1",
"SKUID2",
"SKUID3"
]
}
},
"order": "asc"
}
}
],
"query" :{
"bool" : {
"must" : [
{
"term" : {
"category_codes" : "CATEGORY1"
}
}
]
}
}
}
But it's returning "-1" as the sort value for all records, eg:
sort": [
-1
]
Note: 'skuid_text' is the SKU field I have analyzed as "keyword" type. I have tried both doc['skuid_text'].value and doc['skuid_text'] And I have verified that SKUs in the "sortOrder" array are definitely included in the result set.
What am I missing? Or, is there a completely different way to approach the problem?
Actually the original query DOES kind of work, it's just that the way I was ordering with "asc" was pushing all the skus in the sortOrder array to the end.
If you reverse the order of the skus in sortOrder and do some math on it, it will sort correctly. Kind of hack-y though. I'd love to know if there's a better way anyone can think of.
{
"sort" : [
{
"_script": {
"type": "number",
"script": {
"inline" : "(9999)-params.sortOrder.indexOf(doc['skuid_text'].value)",
"params": {
"sortOrder": [
"SKUID3",
"SKUID2",
"SKUID1"
]
}
},
"order": "asc"
}
}
],
"query" :{
"bool" : {
"must" : [
{
"term" : {
"category_codes" : "CATEGORY1"
}
}
]
}
}
}
This plugin did what you need for <5 versions. You could give it a try on your installation as is, or see what the author has to say about a 5.x update.
https://github.com/jprante/elasticsearch-functionscore-conditionalboost