Elasticsearch NEST 2.x Field Names - couchbase

I am upgrading to NEST 2.3.0 and trying to rewrite all queries that were originally written for NEST 1.x.
I am using the Couchbase transport plugin that pushes data from Couchbase to Elasticsearch.
POCO
public class Park
{
public Park()
{
}
public bool IsPublic { get; set; }
}
Mapping is like this
"mappings": {
"park": {
"_source": {
"includes": [
"doc.*"
],
"excludes": [
"meta.*"
]
},
"properties": {
"meta": {
"properties": {
"rev": {
"type": "string"
},
"flags": {
"type": "long"
},
"expiration": {
"type": "long"
},
"id": {
"type": "string",
"index": "not_analyzed"
}
}
},
"doc": {
"properties": {
"isPublic": {
"type": "boolean"
}
}
}
}
}
}
Sample document in elasticsearch
{
"_index": "parkindex-local-01",
"_type": "park",
"_id": "park_GUID",
"_source": {
"meta": {
"expiration": 0,
"flags": 33554433,
"id": "park_GUID",
"rev": "1-1441a2c278100bc00000000002000001"
},
"doc": {
"isPublic": true,
"id": "park_GUID"
}
}
}
My query in NEST
var termQuery = Query<Park>.Term(p => p.IsPublic, true);
ISearchResponse<T> searchResponse = this.client.Search<T>(s => s.Index("parkindex-local-01")
.Take(size)
.Source(false)
.Query(q => termQuery));
This query goes to Elasticsearch as below
{
"size": 10,
"_source": {
"exclude": [
"*"
]
},
"query": {
"term": {
"isPublic": {
"value": "true"
}
}
}
}
It doesn't retrieve any data, it will work only if I prefix the field name with "doc." so query becomes as below
{
"size": 10,
"_source": {
"exclude": [
"*"
]
},
"query": {
"term": {
"doc.isPublic": {
"value": "true"
}
}
}
}
How do I write the query above in NEST so it can properly interpret the field names, I tried using Nested Query with Path set to "doc", but that gave an error saying field is not of a nested type.
Do I need to change my mapping?
This all used to work in Elasticsearch 1.x and NEST 1.x, I guess this has to do with breaking changes to field names constraints.

Fields can no longer be referenced by shortnames in Elasticsearch 2.0.
isPublic is a property of the doc field, which is mapped as an object type, so referencing by the full path to the property is the correct thing to do.
NEST 2.x has some ways to help with field inference, an example
public class Park
{
public Doc Doc { get; set;}
}
public class Doc
{
public bool IsPublic { get; set;}
}
var termQuery = Query<Park>.Term(p => p.Doc.IsPublic, true);
client.Search<Park>(s => s.Index("parkindex-local-01")
.Take(10)
.Source(false)
.Query(q => termQuery));
results in
{
"size": 10,
"_source": {
"exclude": [
"*"
]
},
"query": {
"term": {
"doc.isPublic": {
"value": true
}
}
}
}
You may also want to take a look at the automapping documentation.

Related

must match URL address returning a lot of documents - Elasticsearch

I'm simply trying to check how many documents have the same link value. There is something weird going on.
Let's say one or more documents has this link value: https://twitter.com/someUser/status/1288024417990144000
I search for it using this JSON query:
/theIndex/_doc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"link": "https://twitter.com/someUser/status/1288024417990144000"
}
}
]
}
}
}
It returns documents 522 of 546, with the first document being the correct one. It acts more like a query_string than a must match
If I search another more unique field like sha256sum:
{
"query": {
"bool": {
"must": [
{
"match": {
"sha256sum": "dad06b7a0a68a0eb879eaea6e4024ac7f97e38e6ac2b191afa7c363948270303"
}
}
]
}
}
}
It returns 1 document like it should.
I've tried searching must term aswell, but it returns 0 documents.
Mapping
{
"images": {
"aliases": {},
"mappings": {
"properties": {
"sha256sum": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"link": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
}
},
"settings": {
"index": {
"number_of_shards": "1",
"provided_name": "images",
"creation_date": "1593711063075",
"analysis": {
"filter": {
"synonym": {
"ignore_case": "true",
"type": "synonym",
"synonyms_path": "synonyms.txt"
}
},
"analyzer": {
"synonym": {
"filter": [
"synonym"
],
"tokenizer": "keyword"
}
}
},
"number_of_replicas": "1",
"uuid": "a5zMwAYCQuW6U4R8POiaDw",
"version": {
"created": "7050199"
}
}
}
}
}
I wouldn't think such a simple issue would be so hard to fix. Am I just missing something right in front of my eyes?
Does anyone know what might be going on here?
Even though I don't see the link field in your mapping (is it source?), I suspect it is a text field and text fields are analyzed. If you want to perform an exact match, you need to match on the link.keyword field and it's going to behave like you expect:
{
"query": {
"bool": {
"must": [
{
"match": {
"link.keyword": "https://twitter.com/someUser/status/1288024417990144000"
^
|
add this
}
}
]
}
}
}

Cleaner way of iterating nested JSON in ruby

I was wondering if there is any 'cleaner' way of looping through nested JSON in ruby?
This is my JSON object:
{
"data": [
{
"file": "test/test_project_js/jquery.js",
"results": [
{
"vulnerabilities": [
{
"severity": "high"
},
{
"severity": "medium"
},
{
"severity": "none"
},
{
"severity": "high"
}
]
}
]
},
{
"file": "test/test_project_js/jquery.js",
"results": [
{
"vulnerabilities": [
{
"severity": "none"
},
{
"severity": "none"
},
{
"severity": "none"
},
{
"severity": "high"
}
]
}
]
}
]
}
I want to extract severity under each vulnerability present inside each results[] which is under data[]
Current code approach is
severity_arr = raw['data'].each do |data|
data['results'].each do |result|
result['vulnerabilities'].map {|vulnerability| vulnerability['severity']}
end
end
You can use flat_map and dig:
data[:data].flat_map { |datum| datum.dig(:results, 0, :vulnerabilities) }
# [{:severity=>"high"}, {:severity=>"medium"}, {:severity=>"none"}, {:severity=>"high"}, {:severity=>"none"}, {:severity=>"none"}, {:severity=>"none"}, {:severity=>"high"}]
What's maybe not convenient, is that data.results holds an array with a single hash. Maybe a hash is enough for that.

How to query an elasticsearch aggregation with a term and sum on different nested objects?

I have the following object whose value attribute is a nested object type:
{
"metadata": {
"tenant": "home",
"timestamp": "2016-03-24T23:59:38Z"
},
"value": {
{ "key": "foo", "int_value": 100 },
{ "key": "bar", "str_value": "taco" }
}
}
This type of object has the following mapping:
{
"my_index": {
"mappings": {
"my_doctype": {
"properties": {
"metadata": {
"properties": {
"tenant": {
"type": "string",
"index": "not_analyzed"
},
"timestamp": {
"type": "date",
"format": "dateOptionalTime"
}
}
},
"value": {
"type": "nested",
"properties": {
"str_value": {
"type": "string",
"index": "not_analyzed"
},
"int_value": {
"type": "long"
},
"key": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
With this setup, I would like to perform an aggregation that performs the following result:
Perform a term aggregation on the str_value attribute of objects where the key is set to "bar"
In each bucket created from the above aggregation, calculate the sum of the int_value attributes where the key is set to "foo"
Have the results laid out in a date_histogram for a given time range.
With this goal in mind, I have been able to get the term and date_histogram aggregations to work on my nested objects, but have not had luck performing the second level of calculation. Here is the current query I am attempting to get working:
{
"query": {
"match_all": {}
},
"aggs": {
"filters": {
"filter": {
"bool": {
"must": [
{
"term": {
"metadata.org": "gw"
}
},
{
"range": {
"metadata.timestamp": {
"gte": "2016-03-24T00:00:00.000Z",
"lte": "2016-03-24T23:59:59.999Z"
}
}
}
]
}
},
"aggs": {
"intervals": {
"date_histogram": {
"field": "metadata.timestamp",
"interval": "1d",
"min_doc_count": 0,
"extended_bounds": {
"min": "2016-03-24T00:00:00Z",
"max": "2016-03-24T23:59:59Z"
},
"format": "yyyy-MM-dd'T'HH:mm:ss'Z'"
},
"aggs": {
"nested_type": {
"nested": {
"path": "value"
},
"aggs": {
"key_filter": {
"filter": {
"term": {
"value.key": "bar"
}
},
"aggs": {
"groupBy": {
"terms": {
"field": "value.str_value"
},
"aggs": {
"other_nested": {
"reverse_nested": {
"path": "value"
},
"aggs": {
"key_filter": {
"filter": {
"term": {
"value.key": "foo"
}
},
"aggs": {
"amount_sum": {
"sum": {
"field": "value.int_value"
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
The result I am expecting to receive from Elasticsearch would look like the following:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 7,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"filters": {
"doc_count": 2,
"intervals": {
"buckets": [
{
"key_as_string": "2016-03-24T00:00:00Z",
"key": 1458777600000,
"doc_count": 2,
"nested_type": {
"doc_count": 5,
"key_filter": {
"doc_count": 2,
"groupBy": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "taco",
"doc_count": 1,
"other_nested": {
"doc_count": 1,
"key_filter": {
"doc_count": 1,
"amount_sum": {
"value": 100.0
}
}
}
}
]
}
}
}
}
]
}
}
}
}
However, the innermost object (...groupBy.buckets.key_filter.amount_sum) is having its value return 0.0 instead of 100.0.
I think this is due to the fact that nested objects are indexed as separate documents, so filtering by one key attribute's value is not allowing me to query to against another key.
Would anyone have any idea on how to get this type of query to work?
For a bit more context, the reason for this document structure is because I do not control the content of the JSON documents that get indexed, so different tenants may have conflicting key names with different values (e.g. {"tenant": "abc", "value": {"foo": "a"} } vs. {"tenant": "xyz", "value": {"foo": 1} }. The method I am trying to use is the one laid out by this Elasticsearch Blog Post, where it recommends to transform objects that you don't control into a structure that you do and to use nested objects to help with this (specifically the Nested fields for each data type section of the article). I would also be open to learn of a better way to handle this situation of not controlling the document's JSON structure if there is one so that I can perform aggregations.
Thank you!
EDIT: I am using Elasticsearch 1.5.
Solved this situation by utilizing the reverse_nested aggregation in the correct way as described here: http://www.shayne.me/blog/2015/2015-05-18-elasticsearch-nested-docs/

Elastic search 2.1 query to get only a element from an array

I have below mapping, I want to access a property of imageMap instead of all collection.
"imageMap": {
"properties": {
"item1": {
"type": "long"
},
"item2": {
"type": "string"
},
"item3": {
"type": "string"
}
}
}
Below is the sample data
imageMap": [
{
"item1": 20893,
"item2": "ImageThumbnail_100_By_44",
"item3": "/9j/4AAQSkZJRg"
},
{
"item1": 20893,
"item2": "ImageThumbnail_400_By_244",
"item3": "/9j/4AAQSkZJRgABAQEAYABgAAD/2w"
}
]
Below is my Query that is not working. Any help is appreciated. Thank you in advance.
Updated:
{
"_source": {
"include": [
"imageMap"
]
},
"query": {
"bool": {
"must": [
{
"term": {
"imageMap.item1": {
"value": 20893
}
},
"term": {
"imageMap.item2": {
"value": "imagethumbnail_100_by_44"
}
}
}
]
}
}
}
Expected Result is below only single element of imageMap, but i am getting array :
"_source": {
"imageMap": [
{
"item2": "ImageThumbnail_100_By_44",
"item1": 20893,
"item3": "/9j/4AAQSkZJRgABAQ"
}
]
}
If you only wan to get a single element from the imageMap array, you need to map imageMap as a nested object like this:
"imageMap": {
"type": "nested", <--- add this
"properties": {
"item1": {
"type": "long"
},
"item2": {
"type": "string"
},
"item3": {
"type": "string"
}
}
}
Then you need to wipe your index and re-build it from scratch with this new mapping.
In the end, you'll be able to retrieve only a specific element using a nested inner_hits query:
{
"_source": false,
"query" : {
"nested" : {
"path" : "imageMap",
"query" : {
"match" : {"imageMap.item2" : "ImageThumbnail_100_By_44"}
},
"inner_hits" : {}
}
}
}
Your query is not working because you are using term query which does not do any analysis on your search string. Since you have not specified any analyzer in mapping ImageThumbnail_100_By_44 is stored as imagethumbnail_100_by_44 because it is analyzed by standard analyzer
Depending on your requirement you could either map your item2 as "index : not_analyzed" and your query will work fine or you could use match query which will do analysis.
{
"_source": {
"include": [
"imageMap"
]
},
"query": {
"bool": {
"must": [
{
"match": {
"imageMap.item2": {
"query": "ImageThumbnail_100_By_44"
}
}
}
]
}
}
}
Please go through this document to have better understanding of analysis process

filter '_index' same way as '_type' in search across multiple index query elastic search

I have two indexes index1 and index2 and both has two types type1 and type2 with same name in elastic search.(please assume that we have valid business reason behind it)
I would like to search index1 - type1 and index2 -type2
here is my query
POST _search
{
"query": {
"indices": {
"indices": ["index1","index2"],
"query": {
"filtered":{
"query":{
"multi_match": {
"query": "test",
"type": "cross_fields",
"fields": ["_all"]
}
},
"filter":{
"or":{
"filters":[
{
"terms":{
"_index":["index1"], // how can i make this work?
"_type": ["type1"]
}
},
{
"terms":{
"_index":["index2"], // how can i make this work?
"_type": ["type2"]
}
}
]
}
}
}
},
"no_match_query":"none"
}
}
}
You can use the indices, type in a bool filter to filter on type and index
The query would look something on these lines :
POST index1,index2/_search
{
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "test",
"type": "cross_fields",
"fields": [
"_all"
]
}
},
"filter": {
"bool": {
"should": [
{
"indices": {
"index": "index1",
"filter": {
"type": {
"value": "type1"
}
},
"no_match_filter": "none"
}
},
{
"indices": {
"index": "index2",
"filter": {
"type": {
"value": "type2"
}
},
"no_match_filter": "none"
}
}
]
}
}
}
}
}
Passing the index names in the url example : index1,index2/_search is a good practice else you risk executing query across all indices in the cluster.