Elasticsearch Nest incorrect function_Score JSON generation - function

I am trying to build the following function for the function_score search query:
{
"filter": {
"range": {
"availabilityAverage": {
"gt": 0
}
}
},
"field_value_factor": {
"field": "availabilityAverage",
"factor": 1,
"modifier": "log1p"
},
"weight": 100
}
This is currently my .Net code
.FieldValueFactor(ff => ff
.Field(fff => fff.StandardPriceMin)
.Factor(2)
.Modifier(FieldValueFactorModifier.Log1P)
.Weight(100)
.Filter(faf => faf
.Range(r => r
.Field(rf => rf.AvailabilityAverage)
.GreaterThan(0.0)
)
)
)
However, this is the result of the NEST query:
{
"filter": {
"range": {
"availabilityAverage": {
"gt": 0.0
}
}
},
"field_value_factor": {
"factor": 2.0,
"field": "standardPriceMin",
"modifier": "log1p",
"filter": {
"range": {
"availabilityAverage": {
"gt": 0.0
}
}
},
"weight": 100.0
},
"weight": 100.0
}
It is adding correctly the filter and weight on the outside of field_value_factor but also including the 'Filter' and 'weight' on the inside as a child element. This is not the case for others such as RandomScore() with exact same format but only with field_value_factor.
I tried several different combinations but neither provided expected result. Is it normal that the NEST is generating this JSON?
Thanks in advance.

It looks like there's a bug in how IFieldValueFactorFunction is being serialized, resulting in filter and weight being included twice, outside of "field_value_factor" and inside. I've opened a pull request to address.

Related

jmespath :select json object element based on other (array) element in the object

I have this JSON
{
"srv_config": [{
"name": "db1",
"servers": ["srv1", "srv2"],
"prop": [{"source":"aa"},"destination":"bb"},{"source":"cc"},"destination":"cc"},]
}, {
"name": "db2",
"servers": ["srv2", "srv2"],
"prop": [{"source":"dd"},"destination":"dd"},{"source":"ee"},"destination":"ee"},]
}
]
}
I try to build a JMESPath expression to select the prop application in each object in the main array, but based on the existence of a string in the servers element.
To select all props, I can do:
*.props [*]
But how do I add condition that says "select only if srv1 is in servers list"?
You can use the contains function in order to filter based on a array containing something.
Given the query:
*[?contains(servers, `srv1`)].prop | [][]
This gives us:
[
{
"source": "aa",
"destination": "bb"
},
{
"source": "cc",
"destination": "cc"
}
]
Please mind that I am also using a bit of flattening here.
All this run towards a corrected version of you JSON:
{
"srv_config":[
{
"name":"db1",
"servers":[
"srv1",
"srv2"
],
"prop":[
{
"source":"aa",
"destination":"bb"
},
{
"source":"cc",
"destination":"cc"
}
]
},
{
"name":"db2",
"servers":[
"srv2",
"srv2"
],
"prop":[
{
"source":"dd",
"destination":"dd"
},
{
"source":"ee",
"destination":"ee"
}
]
}
]
}

Insert whole json to the field in Dynamo DB using AWS Step Functions

I am trying to PutItem into the DynamoDB using the AWS Step Functions.
I managed to save Item with simple string fields (S), but one of the fields should store the whole JSON payload. So it should be the Map (M).
But my payload includes nested Maps also.
Example JSON:
{
"firstMap": {
"field": "something",
},
"secondMap": {
"nestedMap": {
"field": "something",
},
"anotherNestedMap": [
{
"field": "something",
"oneMoreNestedMap": {
"andOneMore": {
"field": "something",
},
"arrayComesHere": [
{
"andAgainNestedMap": {
"field": "something",
},
"andAgain": [
{
"field": "something",
"alsoNestedArray": [
{
"field": "something"
}
]
}
]
}
]
},
"letItBeFinalOne": [
{
"field": "something"
}
]
}
]
...
What I want to do is to just say, hey Step Function, insert please this whole JSON into the item field like this
"Item": {
...
"whole_payload": {
"M.$": "$"
},
} ...
But it fails, cause it accepts only one Map to be handled.
So I need to directly iterate over all nested maps and mark them with 'M'.
Is there a way to make it resolve it by itself?
Like in Typescript I can use aws.DynamoDB.DocumentClient() and just put a whole JSON to the field and it resolves all the maps by itself
Came across a thread for similar request to AWS Step functions team. New feature enhanced allows something closer to what you are looking for.
Sample snippet:
...
"Parameters": {
"TableName" : "dynamodb-table",
"Item":{
"requestId" : {
"S.$": "$.requestId"
},
"payload": {
"S.$":"States.JsonToString($)"
}
}
...
AWS Reference

storing boolean values in elasticsearch : optimization?

I have json documents with entries like :
......
{
"Fieldname" : "booked",
"Fieldvalue" : "yes"
}
...
Within the json document, there are many fields like this, where Boolean value is indirectly mentioned using Fieldname and Fieldvalue : Essentially it signifies that booked=true. Would it be more efficient to transform the json before storing it in elasticsearch ? I.e. replacing the above with :
{
"booked" : true
}
? The search use case is that I want to figure out whether similar json already exists in the system before adding another json.
Yes the later one is much cleaner way to store and search purpose both. Say you want to get all the booked properties from your index then you can easily do this way instead of using extra Fieldname and Fieldvalue
GET /properties/_search
{
"size": 10,
"query": {
"bool": {
"must": [
{
"match": {
"country_code.keyword": "US"
}
},
{
"match": {
"booked": true
}
},
{
"range": {
"usd_price": {
"gte": 50,
"lte": 100000
}
}
}
]
}
},
"sort": [
{
"ranking": {
"order": "desc"
}
}
],
"_source": [
"property_id",
"property_name",
"country",
"country_code",
"state",
"state_abbr",
"city"
]
}

Return selected JSON object from mongo find method

Here is the sample JSON
Sample JSON:
[
{
"_id": "123456789",
"YEAR": "2019",
"VERSION": "2019.Version",
"QUESTION_GROUPS": [
{
"QUESTIONS": [
{
"QUESTION_NAME": "STATE_CODE",
"QUESTION_VALUE": "MH"
},
{
"QUESTION_NAME": "COUNTY_NAME",
"QUESTION_VALUE": "IN"
}
]
},
{
"QUESTIONS": [
{
"QUESTION_NAME": "STATE_CODE",
"QUESTION_VALUE": "UP"
},
{
"QUESTION_NAME": "COUNTY_NAME",
"QUESTION_VALUE": "IN"
}
]
}
]
}
]
Query that am using :
db.collection.find({},
{
"QUESTION_GROUPS.QUESTIONS.QUESTION_NAME": "STATE_CODE"
})
My requirement is retrive all QUESTION_VALUE whose QUESTION_NAME is equals to STATE_CODE.
Thanks in Advance.
If I get you well, What you are trying to do is something like:
db.collection.find(
{
"QUESTION_GROUPS.QUESTIONS.QUESTION_NAME": "STATE_CODE"
},
{
"QUESTION_GROUPS.QUESTIONS.QUESTION_VALUE": 1
})
Attention: you will get ALL the "QUESTION_VALUE" for ANY document which has a QUESTION_GROUPS.QUESTIONS.QUESTION_NAME with that value.
Attention 2: You will get also the _Id. It is by default.
In case you would like to skip those issues, you may need to use Aggregations, and unwind the "QUESTION_GROUPS"-> "QUESTIONS". This way you can skip both the irrelevant results, and the _id field.
It sounds like you want to unwind the arrays and grab only the question values back
Try this
db.collection.aggregate([
{
$unwind: "$QUESTION_GROUPS"
},
{
$unwind: "$QUESTION_GROUPS.QUESTIONS"
},
{
$match: {
"QUESTION_GROUPS.QUESTIONS.QUESTION_NAME": "STATE_CODE"
}
},
{
$project: {
"QUESTION_GROUPS.QUESTIONS.QUESTION_VALUE": 1
}
}
])

Elastic Search - Even distribution on a map

I'm using Elastic Search 2. I have a big database of locations, all of them have a gps attribute, which is a geopoint.
My frontend application displays a google maps component with the results, filtered by my query, let's say pizza. The problem is that the dataset grew a lot, and the client wants even results on the map.
So if I search for a specific query in New York, i would like to have results all over New York, but i'm currently receiving 400 results only in one populous area of Manhattan.
My naive approach was to just filter by distance
{
"size":400,
"query":{
"bool":{
"must":{
"match_all":{
}
},
"filter":{
"geo_distance":{
"distance":"200km",
"gps":[
-73.98502023369585,
40.76195656809083
]
}
}
}
}
}
This doesn't guarantee that the results will be spread across the map.
How can I do it?
I've tried using Geo-Distance Aggregation for this
{
"size":400,
"query":{
"bool":{
"must":{
"match_all":{
}
},
"filter":{
"geo_distance":{
"distance":"200km",
"gps":[
-73.98502023369585,
40.76195656809083
]
}
}
}
},
"aggs":{
"per_ring":{
"geo_distance":{
"field":"gps",
"unit":"km",
"origin":[
-73.98502023369585,
40.76195656809083
],
"ranges":[
{
"from":0,
"to":100
},
{
"from":100,
"to":200
}
]
}
}
}
}
But i just receive a results list + the amount of elements that belong to the buckets. The results list is not guaranteed to be spread.
"aggregations": {
"per_ring": {
"buckets": [
{
"key": "*-100.0",
"from": 0,
"from_as_string": "0.0",
"to": 100,
"to_as_string": "100.0",
"doc_count": 33821
},
{
"key": "100.0-200.0",
"from": 100,
"from_as_string": "100.0",
"to": 200,
"to_as_string": "200.0",
"doc_count": 6213
}
]
}
}
I would like to grab half of the results from one bucket, half from the other bucket.
I've also attempted to use Geohash Grid Aggregation, but that also doesn't give me samples of results for every bucket, just provides the areas.
So how do I get a spaced distribution of results spread across my map with one elastic search query?
Thanks!
I think introducing some randomness may give you the desired result. I am assuming you're seeing the same distribution because of index ordering (you're not scoring based on distance, and you're taking the first 400 so you are most likely seeing the same result set).
{
"size": 400,
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"filter": {
"geo_distance": {
"distance": "200km",
"gps": [
-73.98502023369585,
40.76195656809083
]
}
}
}
},
"functions": [
{
"random_score": {}
}
]
}
}
}
Random score in elastic