Tag Field search scoring in RediSearch - redisearch

I am performing a REDISEARCH tag search only query like this:
'#tags:{Food|Restaurant|Chicago}'
I would like to sort the result based on number of successful matches of these tags in the document.
So that:
a document containing all 3 tags would rank first
a document matching only 2 of these would rank second
a document mathcing only 1 would rank third.
Is this possible ? If so, please how can I perform such a sort ?
Thanks!

At current state, this isn't directly in the feature set of RediSearchemphasized text. Matching tags are not counted towards the score. An example:
127.0.0.1:6379> ft.create test SCHEMA mytags TAG
OK
127.0.0.1:6379> ft.add test one 1 FIELDS mytags "aa"
OK
127.0.0.1:6379> ft.add test two 1 FIELDS mytags "aa,bb"
OK
127.0.0.1:6379> ft.add test three 1 FIELDS mytags "aa,bb,cc"
OK
127.0.0.1:6379> ft.add test four 1 FIELDS mytags "aa,bb"
OK
When you query, add the WITHSCORES argument:
127.0.0.1:6379> FT.SEARCH test "#mytags:{aa|bb|cc}" WITHSCORES
1) (integer) 4
2) "four"
3) "inf"
4) 1) "mytags"
2) "aa,bb"
5) "three"
6) "inf"
7) 1) "mytags"
2) "aa,bb,cc"
8) "two"
9) "inf"
10) 1) "mytags"
2) "aa,bb"
11) "one"
12) "inf"
13) 1) "mytags"
2) "aa"
The "inf" is indicating that there is no score attributed to this match so the order is not taken into account here for scoring purposes.
I also attempted to do this with the aggregation engine and could not find a way to create a pipeline that would effectively count matched tags.
This seems like a straight forward use, I did file an issue for this.
A couple of workarounds:
You maybe be able to use a TEXT field and optional clauses ("~Food|~Restaurant|~Chicago"), but a text field will have slightly different properties.
RedisGears could be used to compose a few queries into a single result.

Related

Can I find item on database with given array of string but using partial string?

I have model Product with name and category
On seed:
Product.create(name: "apple", category: "food")
Product.create(name: "biogesic", category: "medicine")
And a 2 dimensional array:
[[1, "tray of apple", 150.00], [1, "box of ballpen", 70.30]]
What I need is to get if the string inside the array contains or is on the table/database Product
Here's what I'm thinking but I'm lost:
isProduct = Product.where("name like ?", "%an apple%").first
Where "%apple%" is supposed to be the string on array, but with that code it is limited for 1 word only.
I don't need the product id, I just need it if it is on the Product table.
In a certain way, this can be accomplished with the Regular Expression Operator ~ for PostgreSQL and/or REGEXP for MySQL:
regex = array.flat_map { |_, sentence, _| sentence.split }.join('|')
Product.exists?(['name ~ ?', regex])
Product.exists?(['name REGEXP ?', regex])
Which produces:
SELECT 1 AS one FROM "products" WHERE (name ~ 'tray|of|apple|box|of|ballpen') LIMIT $1 [["LIMIT", 1]]
As it searches for the presence of every single word within the sentences tray of apple and/or box of ballpen.
So, in case you have a record like:
Product.new(name: 'tray of apple and box of ballpen')
It'll cover the query and return true.

Counting occurrences of a list item from a list?

(See edit at the bottom of this post)
I'm making a program in Elixir that counts the types of HTML tags from a list of tags that I've already obtained. This means that the key should be the tag and the value should be the count.
e.g. in the following sample file
<html><head><body><sometag><sometag><sometag2><sometag>
My output should be something like the following:
html: 1
head: 1
body: 1
sometag: 3
sometag2: 1
Here is my code:
def tags(page) do
taglist = Regex.scan(~r/<[a-zA-Z0-9]+/, page)
dict = Map.new()
Enum.map(taglist, fn(x) ->
tag = String.to_atom(hd(x))
Map.put_new(dict, tag, 1)
end)
end
I know I should be probably using Enum.each instead but when I do that my dictionary ends up just being empty instead of incorrect.
With Enum.map, this is the output I receive:
iex(15)> A3.test
[%{"<html" => 1}, %{"<body" => 1}, %{"<p" => 1}, %{"<a" => 1}, %{"<p" => 1},
%{"<a" => 1}, %{"<p" => 1}, %{"<a" => 1}, %{"<p" => 1}, %{"<a" => 1}]
As you can see, there are duplicate entries and it's turned into a list of dictionaries. For now I'm not even trying to get the count working, so long as the dictionary doesn't duplicate entries (which is why the value is always just "1").
Thanks for any help.
EDIT: ------------------
Okay so I figured out that I need to use Enum.reduce
The following code produces the output I'm looking for (for now):
def tags(page) do
rawTagList = Regex.scan(~r/<[a-zA-Z0-9]+/, page)
tagList = Enum.map(rawTagList, fn(tag) -> String.to_atom(hd(tag)) end)
Enum.reduce(tagList, %{}, fn(tag, acc) ->
Map.put_new(acc, tag, 1)
end)
end
Output:
%{"<a": 1, "<body": 1, "<html": 1, "<p": 1}
Now I have to complete the challenge of actually counting the tags as I go...If anyone can offer any insight on that I'd be grateful!
First of all, it is not the best idea to parse html with regexes. See this question for more details (especially the accepted answer).
Secondly, you are trying to write imperative code in functional language (this is about first version of your code). Variables in Elixir are immutable. dict will always be an empty map. Enum.map takes a list and always returns new list of the same length with all elements transformed. Your transformation function takes an empty map and puts one key-value pair into it.
As a result you get a list with one element maps. The line:
Map.put_new(dict, tag, 1)
doesn't update dict in place, but creates new one using old one, which is empty. In your example it is exactly the same as:
%{tag => 1}
You have couple of options to do it differently. Closest approach would be to use Enum.reduce. It takes a list, an initial accumulator and a function elem, acc -> new_acc.
taglist
|> Enum.reduce(%{}, fn(tag, acc) -> Map.update(acc, tag, 1, &(&1 + 1)) end)
It looks a little bit complicated, because there are couple of nice syntactic sugars. taglist |> Enum.reduce(%{}, fun) is the same as Enum.reduce(taglist, %{}, fun). &(&1 + 1) is shorthand for fn(counter) -> counter + 1 end.
Map.update takes four arguments: a map to update, key to update, initial value if key doesn't exist and a function that does something with the key if it exists.
So, those two lines of code do this:
iterate over list Enum.reduce
starting with empty map %{}
take current element and map fn(tag, acc) and either:
if key doesn't exist insert 1
if it exists increment it by one &(&1 + 1)

MySql Seperate values in one col to many

I am retrieving data from mysql db. All the data is one column. I need to separate this into several cols: The structure of this col is as follows:
{{product ID=001 |Country=Netherlands |Repository Link=http://googt.com |Other Relevant Information=test }} ==Description== this are the below codes: code 1 code2 ==Case Study== case study 1 txt case study 2 txt ==Benefits== ben 1 ben 2 === Requirements === (empty col) === Architecture === *arch1 *arch2
So I want cols like: Product ID, Country, Repository Link, Architecture etc.....
If you are planning on simply parsing out the output of your column, it will depend on the language of choice you are currently using.
However, in general the procedure for doing this is as follows.
1, pull output into string
2, find a delimiter(In you case it appears '|' will do)
3, you have to options here(again depending on language)
A, Split each segment into an array
1, Run array through looping structure to print out each section OR use array
to manipulate data individually(your choice)
B, In Simple String method, you can either create a new string, or replace all
instances of '|' with '\n'(new line char) so that you can display all data.
I recommend the array conversion as this will allow you to easily interact with the data in a simple manner.
This is often something done today with json and other such formats which are often stored in single fields for various reasons.
Here is an example done in php making use of explode()
$unparsed = "this | is | a | string that is | not: parsed";
$parsed = explode("|", $unparsed);
echo $parsed[2]; // would be a
echo $parsed[4]; // would be not: parsed

Parsing numerical data using Prolog?

I am new to prolog and am considering using it for a small data analysis application. Here is what I am seeking to accomplish:
I have a CSV file with some data of the following from:
a,b,c
d,e,f
g,h,i
...
The data is purely numerical and I need to do the following: 1st, I need to group rows according to the following scheme:
So what's going on above?
I start at the 1st row, which has value 'a' in column one. Then, I keep going down the rows until I hit a row whose value in column one differs from 'a' by a certain amount, 'z'. The process is then repeated, and many "groups" are formed after the process is complete.
For each of these groups, I want to find the mean of columns two and three (as an example, for the 1st group in the picture above, the mean of column two would be: (b+e+h)/3).
I am pretty sure this can be done in prolog. However, I have 50,000+ rows of data and since prolog is declarative, I am not sure how efficient prolog would be at accomplishing the above task?
Is it feasible to work out a prolog program to accomplish the above task, so that efficiency of the program is not significantly lower than a procedural analog?
this snippet could be a starting point for your task
:- [library(dcg/basics)].
rownum(Z, AveList) :- phrase_from_file(row_scan(Z, [], [], AveList), 'numbers.txt').
row_scan(Z, Group, AveSoFar, AveList) -->
number(A),",",number(B),",",number(C),"\n",
{ row_match(Z, A,B,C, Group,AveSoFar, Group1,AveUpdated) },
row_scan(Z, Group1, AveUpdated, AveList).
row_scan(_Z, _Group, AveList, AveList) --> "\n";[].
% row_match(Z, A,B,C, Group,Ave, Group1,Ave1)
row_match(_, A,B,C, [],Ave, [(A,B,C)],Ave).
row_match(Z, A,B,C, [H|T],Ave, Group1,Ave1) :-
H = (F,_,_),
( A - F =:= Z
-> aggregate_all(agg(count,sum(C2),sum(C3)),
member((_,C2,C3), [(A,B,C), H|T]), agg(Count,T2,T3)),
A2 is T2/Count, A3 is T3/Count,
Group1 = [], Ave1 = [(A2,A3)|Ave]
; Group1 = [H,(A,B,C)|T], Ave1 = Ave
).
with this input
1,2,3
4,5,6
7,8,9
10,2,3
40,5,6
70,8,9
16,0,0
yields
?- rownum(6,L).
L = [ (3.75, 4.5), (5, 6)]

phpmyadmin - category as dbname?

I don't know how to specify the title for the question, and that's why i cannot find the answer by my own.
I'll try to explain what I mean.
We have a database. Then, on the list of the databases there's something like category? it's unclickable and it's bold. It contains three databases preceded by underscore and each of this databases contains tables.
It looks something like:
Category
> _something1
table1, table2, table3...
> _something2
table1, table2, table3...
> _something3
table1, table2, table3...
How it's called and how can I reach the effect above?
Here, "Category" is a prefix for database names. So your databases names are
category_something1
category_something2
category_something3
They are displayed in a collapsible/expandable tree.
From what i understood from the question, You want to write a program that will show 6 rows like this:
Category: -Something 1 table1,table2,table3
-Something 2 table1,table2,table3 -Something 3 table1,table2,table3
First of all you would need to form the rows in ASC order.
Change some thing like this :
Array(Category,
Category_something1,
Category_something2_table1,
Category_something2_table2,
Category_something2_table3,
Category_something2,
Category_something3_table1,
Category_something3_table2,
Category_something3_table3,
Category_something3,
Category_something1_table1,
Category_something1_table2,
Category_something1_table3,)
into :
Array(Category,
Category_something1,
Category_something1_table1,
Category_something1_table2,
Category_something1_table3,
Category_something2,
Category_something2_table1,
Category_something2_table2,
Category_something2_table3,
Category_something3,
Category_something3_table1,
Category_something3_table2,
Category_something3_table3,)
Then Run a loop that splits each string into 3. if second value/third is not present print as heading/category else print as item/something.
While(ArrayKey is not equal to count(Array))
{
SplitString(ArrayCurValue,"_",$VarValue1,$VarValue2,$VarValue3); //split current string into 3 vars where _ is present
if($VarValue2 == "")
{
Print "<b>",$VarValue1,"</b>";
}else{
if($VarValue3 == "")
{
Print " -",$VarValue2;
}else{
Print " -",$VarValue3;
}
}
}