storing boolean values in elasticsearch : optimization? - json

I have json documents with entries like :
......
{
"Fieldname" : "booked",
"Fieldvalue" : "yes"
}
...
Within the json document, there are many fields like this, where Boolean value is indirectly mentioned using Fieldname and Fieldvalue : Essentially it signifies that booked=true. Would it be more efficient to transform the json before storing it in elasticsearch ? I.e. replacing the above with :
{
"booked" : true
}
? The search use case is that I want to figure out whether similar json already exists in the system before adding another json.

Yes the later one is much cleaner way to store and search purpose both. Say you want to get all the booked properties from your index then you can easily do this way instead of using extra Fieldname and Fieldvalue
GET /properties/_search
{
"size": 10,
"query": {
"bool": {
"must": [
{
"match": {
"country_code.keyword": "US"
}
},
{
"match": {
"booked": true
}
},
{
"range": {
"usd_price": {
"gte": 50,
"lte": 100000
}
}
}
]
}
},
"sort": [
{
"ranking": {
"order": "desc"
}
}
],
"_source": [
"property_id",
"property_name",
"country",
"country_code",
"state",
"state_abbr",
"city"
]
}

Related

Nested inner hits in FreeText search use case

I building a standard free text search on a site that sells cars.
In the search box the user can enter a search word that are passed on to the query where it is used to match both nested and non-nested properties.
I'm using inner_hits to limit the number of variants returned by the query (in this sample variants is not remove from _source)
When matching on a nested property color the inner_hits collection contains the correct variant as expected.
However when matching on a non-nested property title the inner_hits collection is empty. I understand why it's empty.
Can you suggest a better way to structure the query?
Another option would be to always just return at least 1 variant - but how can the be achieved?
Mappings
PUT test
{
"mappings": {
"car": {
"properties": {
"variants": {
"type": "nested"
}
}
}
}
}
Insert data
PUT test/car/1
{
"title": "VW Golf",
"variants": [
{
"color": "red",
"forsale": true
},
{
"color": "blue",
"forsale": false
}
]
}
Query by color
GET test/_search
{
"query": {
"nested": {
"path": "variants",
"query": {
"match": {
"variants.color": "blue"
}
},
"inner_hits": {}
}
}
}
Color query: works as expected!
"hits" : [
{
"_source" : {
"title" : "VW Golf",
"variants" : [
{
"color" : "red",
"forsale" : true
},
{
"color" : "blue",
"forsale" : false
}
]
},
"inner_hits" : {
"variants" : {
"hits" : {
"total" : 1,
"hits" : [
{
"_nested" : {
"field" : "variants",
"offset" : 1
},
"_source" : {
"color" : "blue",
"forsale" : false
}
}
]
}
}
}
}
]
Query by brand
GET test/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"title": "golf"
}
},
{
"nested": {
"path": "variants",
"query": {
"match": {
"variants.color": "golf"
}
},
"inner_hits": {}
}
}
]
}
}
}
Brand query result :-(
"hits" : [
{
"_source" : {
"title" : "VW Golf",
"variants" : [
{
"color" : "red",
"forsale" : true
},
{
"color" : "blue",
"forsale" : false
}
]
},
"inner_hits" : {
"variants" : {
"hits" : {
"total" : 0,
"hits" : [ ]
}
}
}
}
You already know it but inner_hits returns an empty array because no nested documents matched in the nested query.
A simple solution is to change the query such that the nested query will always match. This can be done by wrapping the nested query into a bool query and add a match_all query.
If you set the boost of the match_all query to 0, it will not contribute to the score. Consequently, if a nested document match it will be first.
Now the inner hits will not be empty, but there is a second problem, all the documents will match. You can either:
set a min_score with a very small value (e.g., 0.00000001) to discard document with a score of 0
duplicate the original nested query and use a minimum_should_match at 2.
{
"query": {
"bool": {
// Ensure that at least 1 of the first 2 queries will match
// The third query will always match
"minimum_should_match": 2,
"should": [
{
"match": {
"title": <SEARCH_TERM>
}
},
{
"nested": {
"path": "variants",
"query": {
"match": {
"variants.color": <SEARCH_TERM>
}
}
}
},
{
"nested": {
"path": "variants",
"query": {
"bool": {
"should": [
{
"match": {
"variants.color": <SEARCH_TERM>
}
},
{
// Disable scoring
"match_all": { "boost": 0 }
}
]
}
},
"inner_hits": {}
}
}
]
}
}
}
One way to do it is using a script_fields clause.
You would write a little script in painless that would do the following:
store the List you get from variants in a variable
then iterate over the Maps in this List
if the Map has
color blue you return the Map . (If none evaluate to true you return an empty
Map). This would create an additional field per searchresult with only those variants where the color is blue.
One important drawback is that this is a very heavy operation, especially if you have many records.
You can take this approach if it is something only you will ever do, maybe a few times a year outside peak hours. If your use case is something with regular use and to be performed by many users, I would change the mapping, return variants as a whole or choose some other solution.

Mongodb query on triple nested array of object

I'm having some problem to write a query to return a triple nested value from a document. The documents I'm using are structured like this
{
"areaname": "name1",
"places": [
{
"placename": "place1",
"objects": [
{
"objname": "obj1",
"tags": [
"tag1",
"tag2"
]
},
{
"objname": "obj2",
"tags": [
"tag6",
"tag7"
]
}
]
},
{
"placename": "place2",
"objects": [
{
"objname": "obj45",
"tags": [
"tag46",
"tag34"
]
},
{
"objname": "obj77",
"tags": [
"tag56",
"tag11"
]
}
]
}
]
}
It is quite simple actually but I can't find a solution to a simple query like:
"return the objname of the object that contains tag1 inside their tag"
So for the give document if I use "tag1" as a parameter it is expected for the query to return "obj1"
It should give me the same result if I use "tag2" as a parameter
Other example: using "tag56" it should return only "obj77"
Right now i have no problem returning the whole document using the dot-notation or top level field such as areaname or others
db.users.find( {"places.objects.tags":"tag1"}, { areaname: 1, _id:0 } )
Is this even possible?
Keeping it simple:
[
{
"$match" : {
"places.objects.tags" : "tag1"
}
},
{
"$unwind" : "$places"
},
{
"$unwind" : "$places.objects"
},
{
"$match" : {
"places.objects.tags" : "tag1"
}
},
{
"$group" : {
"_id" : "$_id",
"obj_names" : {
"$push" : "$places.objects.objname"
}
}
}
],
You should add any other fields you want to keep to the group stage,
this can also be done without the double $unwind stage but i choose this for read-ability.

Return selected JSON object from mongo find method

Here is the sample JSON
Sample JSON:
[
{
"_id": "123456789",
"YEAR": "2019",
"VERSION": "2019.Version",
"QUESTION_GROUPS": [
{
"QUESTIONS": [
{
"QUESTION_NAME": "STATE_CODE",
"QUESTION_VALUE": "MH"
},
{
"QUESTION_NAME": "COUNTY_NAME",
"QUESTION_VALUE": "IN"
}
]
},
{
"QUESTIONS": [
{
"QUESTION_NAME": "STATE_CODE",
"QUESTION_VALUE": "UP"
},
{
"QUESTION_NAME": "COUNTY_NAME",
"QUESTION_VALUE": "IN"
}
]
}
]
}
]
Query that am using :
db.collection.find({},
{
"QUESTION_GROUPS.QUESTIONS.QUESTION_NAME": "STATE_CODE"
})
My requirement is retrive all QUESTION_VALUE whose QUESTION_NAME is equals to STATE_CODE.
Thanks in Advance.
If I get you well, What you are trying to do is something like:
db.collection.find(
{
"QUESTION_GROUPS.QUESTIONS.QUESTION_NAME": "STATE_CODE"
},
{
"QUESTION_GROUPS.QUESTIONS.QUESTION_VALUE": 1
})
Attention: you will get ALL the "QUESTION_VALUE" for ANY document which has a QUESTION_GROUPS.QUESTIONS.QUESTION_NAME with that value.
Attention 2: You will get also the _Id. It is by default.
In case you would like to skip those issues, you may need to use Aggregations, and unwind the "QUESTION_GROUPS"-> "QUESTIONS". This way you can skip both the irrelevant results, and the _id field.
It sounds like you want to unwind the arrays and grab only the question values back
Try this
db.collection.aggregate([
{
$unwind: "$QUESTION_GROUPS"
},
{
$unwind: "$QUESTION_GROUPS.QUESTIONS"
},
{
$match: {
"QUESTION_GROUPS.QUESTIONS.QUESTION_NAME": "STATE_CODE"
}
},
{
$project: {
"QUESTION_GROUPS.QUESTIONS.QUESTION_VALUE": 1
}
}
])

Elasticsearch: how to search a user_defined field?

I use mysql to store user data, and search the data by Elasticsearch. For mysql, I have a user defined field, this field can store a JSON format data.
the example data like this:
data1 =
{
"name" : "test1",
"age": 10,
"user_defined": {
"a" : "aaa",
"b" : "bbb",
"c" : "ccc",
.....
}
}
data2 =
{
"name" : "test2",
"age": 20,
"user_defined": {
"d" : "ddd",
"e" : "eee",
"f" : "fff",
.....
}
}
For user_defined field, the number of keys is not fixed, the type of values all are string, I hope each key can be searched, how to define the mapping? how to search this kind of data by Elasticsearch?
Anyone has good idea?
You can define the mapping of the user_defined as "type": "object", like this:
PUT your_index
{
"mappings": {
"your_type": {
"properties": {
"name": {
"type": "string"
},
"age": {
"type": "integer"
},
"user_defined": {
"type": "object"
}
}
}
}
}
Thereafter, you can index your documents and search them easily with the Query DSL
POST your_index/_search
{
"query": {
"match" : {
"user_defined.a" : "aaa"
}
}
}
Every field in a document in Elasticsearch is indexed and searchable by default.
Elasticsearch provides a Search Lite API and a Query DSL for searching.

Nested filter numerical range

I have the following json object:
{
"Title": "Terminator,
"Purchases": [
{"Country": "US", "Site": "iTunes", "Price": 4.99},
{"Country": "FR", "Site": "Google", "Price": 5.99}
]
}
I want to be able to find an object specifying a Country+Site+PriceRange. For example, the above should return True on Country=US&Price<5.00, but should return False on Country=FR&Price<5.00. How would the index and query look to do this? Here is another answer that this is a follow-up question to: Search within array object.
Simply add a Range query to your Bool query logic tree. This will return documents that match US for country and have the Price field with a numeric value less than 5.
{ "query":
{ "nested" : {
"path" : "Purchases",
"score_mode" : "avg",
"query" : {
"bool" : {
"must" : [
{
"match" : {"Purchases.Country" : "US"}
},
{
"range" : "Purchases.Price":
{
"lte": 5
}
}
]
}
}
}
}
}