function score with multi-match query - function

I have an elastic query that contains a multi-match query in it..
"multi_match" => [
"query" => "Will Smith"
"type" => "best_fields"
"fields" => [
"title^10",
"description^7",
"keywords",
"name"
]
"operator" => "and"
]
I want to Add two function-score query for multi-match query...
and give a higher weigh to multi-match query with phrase type.. and a less weigh to multi-match query that has best_fields type...
I mean the documents that has the keyword exactly like what I searched must have higher _score
I wrote the query and function_score in a bool and must query... but the result did not changed..
does anyone any idea how to manage my query to get better results?
thanks.

I can show you how I've done something similar. Take a look at below query:
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "SEARCH TERM HERE",
"fields": [
"title^70",
"description^30",
"content^20"
],
"type": "phrase",
"boost": 100
}
},
{
"multi_match": {
"query": "SEARCH TERM HERE",
"fields": [
"title^30",
"description^25",
"content^10"
],
"type": "most_fields",
"minimum_should_match": "100%",
"boost": 50
}
},
{
"multi_match": {
"query": "SEARCH TERM HERE",
"fields": [
"title^25",
"description^15",
"content^10"
],
"type": "most_fields",
"minimum_should_match": "50%",
"boost": 25
}
},
...
]
}
}
}
First multi_match is matching documents only when full search phrase is found and boost entire result with 100.
Second part searches for 100% words from search term. So the order of words does not matter but all of them must appear in searched document. Boost = 50.
Third part searches for 50% match. Means not all words have to be in document to return it in results. Boost = 25.
The ... part means that I have something more for rest of the results. But it is not needed in each case.
The boost values are selected by myself in many tries and could not be good for every single case. You have to remember that behind of relevancy there is a quite complex algorithm. For more info take a look into:
What Is Relevance?
Theory Behind Relevance Scoring

Related

Elastic Search - Nested aggregation

I would like to form a nested aggregation type query in elastic search. Basically , the nested aggregation is at four levels.
groupId.keyword
---direction
--billingCallType
--durationCallAnswered
example:
"aggregations": {
"avgCallDuration": {
"terms": {
"field": "groupId.keyword",
"size": 10000,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"call_direction": {
"terms" : {
"field": "direction"
},
"aggregations": {
"call_type" : {
"terms": {
"field": "billingCallType"
},
"aggregations": {
"avg_value": {
"terms": {
"field": "durationCallAnswered"
}
}
}
}
}
}
}
}
}
This is part of a query . While running this , I am getting the error as
"type": "illegal_argument_exception",
"reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [direction] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
Can anyone throw light on this?
Tldr;
As the error state, you are performing an aggregation on a text field, the field direction.
Aggregation are not supported by default on text field, as it is very expensive (cpu and memory wise).
They are 3 solutions to your issue,
Change the mapping from text to keyword (will require re indexing, most efficient way to query the data)
Change the mapping to add to this field fielddata: true (flexible, but not optimised)
Don't do the aggregation on this field :)

Using Power Query to extract data from nested arrays in JSON

I'm relatively new to Power Query, but I'm pulling in this basic structure of JSON from a web api
{
"report": "Cost History",
"dimensions": [
{
"time": [
{
"name": "2019-11",
"label": "2019-11",
…
},
{
"name": "2019-12",
"label": "2019-12",
…
},
{
"name": "2020-01",
"label": "2020-01",
…
},
…
]
},
{
"Category": [
{
"name": "category1",
"label": "Category 1",
…
},
{
"name": "category2",
"label": "Category 2",
…
},
…
]
}
],
"data": [
[
[
40419.6393798211
],
[
191.44
],
…
],
[
[
2299.652439184997
],
[
0.0
],
…
]
]
}
I actually have 112 categories and 13 "times". I figured out how to do multiple queries to turn the times into column headers and the categories into row labels (I think). But the data section is alluding me. Because each item is a list within a list I'm not sure how to expand it all out. Each object in the date array will have 112 numbers and there will be 13 objects. If that all makes sense.
So ultimately I want to make it look like
2019-11 2019-20 2020-01 ...
Category 1 40419 2299
Category 2 191 0
...
First time asking a question on here, so hopefully this all makes sense and is clear. Thanks in advance for any help!
i am also researching this exact thing and looking for a solution. In PQ, it displays nested arrays as a list and there is a function to extract values choosing a separating characterenter image description here
So this becomes, this
enter image description here
= Table.TransformColumns(#"Filtered Rows", {"aligned_to_ids", each Text.Combine(List.Transform(_, Text.From), ","), type text})
However the problem i'm trying to solve is when the nested json has multiple values like this: enter image description here
And when these LIST are extracted then an error message is caused, = Table.TransformColumns(#"Extracted Values1", {"collaborators", each Text.Combine(List.Transform(_, Text.From), ","), type text})
Expression.Error: We cannot convert a value of type Record to type Text.
Details:
Value=
id=15890
goal_id=323
role_id=15
Type=[Type]
It seems the multiple values are not handled and PQ does not recognise the underlying structure to enable the columns to be expanded.

Testing an utterance: comparison to "published" produces JSON string completely different from results obtained by querying the API

I just trained my LUIS application and published it to production. If I test it on an utterance, I can see how that result compares to the published version and look at the JSON result. The problem is I'm getting a completely different JSON result there than I get when I query the API via its URL. Here is the test result JSON:
{
"query": "please show me *johnson*",
"prediction": {
"normalizedQuery": "please show me *johnson*",
"topIntent": "Show",
"intents": {
"Show": {
"score": 0.985523641
}
},
"entities": {
"ShowObject": [
"*johnson*"
],
"$instance": {
"ShowObject": [
{
"type": "ShowObject",
"text": "*johnson*",
"startIndex": 15,
"length": 9,
"score": 0.8382344,
"modelTypeId": 1,
"modelType": "Entity Extractor",
"recognitionSources": [
"model"
]
}
]
}
}
}
}
and here is the API query result:
{
"query": "please show me *johnson*",
"topScoringIntent": {
"intent": "Show",
"score": 0.985523641
},
"intents": [
{
"intent": "Show",
"score": 0.985523641
}
],
"entities": [
{
"entity": "* johnson *",
"type": "ShowObject",
"startIndex": 15,
"endIndex": 23,
"score": 0.8382344
}
]
}
The problem with the API query result is that it doesn't return enough information about the entity, and it returns a different entity than the test result. Note above that the test result returns *johnson* with no spaces near the asterisks, which is how the original query is, but the API query result returns * johnson * with spaces near the asterisks. I don't want it to put the spaces in, so I prefer the test result over the API query result.
Why are they different, and how do I get the API query to return a result like the test, i.e. with no modification of the input string to add spaces near the asterisks.
Here is the API query URL including parameters:
https://westus.api.cognitive.microsoft.com/luis/v2.0/apps/[app ID removed]?q=please+show+me+*johnson*&timezoneOffset=0&verbose=true&spellCheck=false&staging=false
Oh, I see now - this is probably to represent a wildcard search, right? If so, I'm not personally aware of a way to strip this out of a LUIS response, and I've seen similar, I think, when there's an #mention in there as well. However, if this is to facilitate searches, such that you know there's a good chance to have a * before and/or after the "ShowObject" entity, then it should be easy enough to test for this and replace, either string or regex (replace the "star-plus-space" with a space, I mean - I realise you need the star itself). Basically, you'd replace "[start][space]" with "[start]", and same at the end. Not pretty, but workable and simple to implement...
Just out of interest, do you anticipate * in the middle of a string as well?

Querying Microsoft Academic graph by fields of study of references

I've been playing around with the Microsoft Academic API, trying to do graph queries using JSON formatted queries. I'm at the point where I think I can produce results, but for some reason I don't get the full set of results.
The query I am attempting to perform will retrieve all papers that reference a paper that has a FieldOfStudy that is one of the ones I'm looking for. Essentially, I'm trying to find out how well cited a field of study is.
I think the query should look something like this:
{
"path": "/paper/ReferenceIDs/reference/FieldOfStudyIDs/field",
"paper": {
"type": "Paper",
"match" : {
"PublishYear": 2017
},
"select": ["DOI","OriginalTitle","PublishYear"]
},
"reference" : {
"type" : "Paper",
"select" : "OriginalTitle"
},
"field": {
"type": "FieldOfStudy",
"select": [ "Name" ],
"return": { "id": [106686826,204641814] }
}
}
Unfortunately, I get only an incomplete subset of results. Funnily enough though, if I further restrict the initial node by matching on a title, I get another set of results (disjoint from the first query result set)
{
"path": "/paper/ReferenceIDs/reference/FieldOfStudyIDs/field",
"paper": {
"type": "Paper",
"match" : {
"OriginalTitle": "cancer",
"PublishYear": 2017
},
"select": ["DOI","OriginalTitle","PublishYear"]
},
"reference" : {
"type" : "Paper",
"select" : "OriginalTitle"
},
"field": {
"type": "FieldOfStudy",
"select": [ "Name" ],
"return": { "id": [106686826,204641814] }
}
}
So, what could be going on here? Is the query giving up because the very first node it hits on the broader search doesn't match the path? Is it even possible to query all papers published in a year like this?

How to use full text search with arango REST API as I'm always getting empty result?

Data in Arango:
{
"employees": [
{
"lastName": "Ansari",
"firstName": "Haseb"
},
{
"lastName": "Ansari",
"firstName": "Affan"
},
{
"lastName": "Keshav",
"firstName": "Anil"
}
],
"_id": "test/124518952473",
"_rev": "124518952473",
"_key": "124518952473"
}
Indexing:
POST http://localhost:8529/_db/db_test/_api/index?collection=test
Body:
{
"type" : "fulltext",
"fields" : [
"lastName"
]
}
Searching:
PUT http://localhost:8529/_db/db_test/_api/simple/fulltext
Body:
{
"collection" : "test",
"attribute" : "lastName",
"query" : "Ansari"
}
I want in my application to use REST API for full text search. Please help me where I am going wrong here. And this is one document in arango store just for example. Otherwise, i'll be having more documents hence full-text search.
Short: Your index is not on a field in your document.
Long:
You're saving one document, it has a list employees, but no document lastname - thats inside of employees but this won't match the path.
You can however make it work if you put the fulltext index on employees, then all attributes of the object employees will be indexed. You then however will match on first and last name.
If you want to do this separate, you need to directly match a token like that:
employees[0].lastName