Very basic Redis index and search not working - redisearch

I'm starting with Redis (v7) and the lettucemod (v3.1.6), and trying to simulate an index/search sample found in redis docs. But I'm not able to get the right result. Below the sequence of actions taken in redis-cli:
root#redis01:~# redis-server --version
Redis server v=7.0.7 sha=00000000:0 malloc=jemalloc-5.2.1 bits=64 build=2260280010e18db8
root#redis01:~# redis-cli
127.0.0.1:6379> info modules
# Modules
module:name=ReJSON,ver=20008,api=1,filters=0,usedby=[search],using=[],options=[handle-io-errors]
module:name=search,ver=20405,api=1,filters=0,usedby=[],using=[ReJSON],options=[]
127.0.0.1:6379> FT.CREATE itemIdx ON JSON PREFIX 1 36: SCHEMA $.encoding AS encoding TEXT
OK
127.0.0.1:6379> FT._LIST
1) "itemIdx"
127.0.0.1:6379> JSON.SET 36:1 $ '{"encoding":"utf-8"}'
OK
127.0.0.1:6379> JSON.GET 36:1
"{\"encoding\":\"utf-8\"}"
127.0.0.1:6379> FT.SEARCH itemIdx '*'
1) (integer) 1
2) "36:1"
3) 1) "$"
2) "{\"encoding\":\"utf-8\"}"
127.0.0.1:6379> FT.SEARCH itemIdx '#encoding:utf-8'
1) (integer) 0
127.0.0.1:6379> FT.SEARCH itemIdx '#encoding:(tf)'
1) (integer) 0
Any help would be very appreciated.
Joan.

Let's start from the second query,
127.0.0.1:6379> FT.SEARCH itemIdx '#encoding:(tf)'
1) (integer) 0
It's missing a * to indicate it's a suffix query see: Infix/Suffix matching
127.0.0.1:6379> FT.SEARCH itemIdx '#encoding:*tf'
1) (integer) 1
2) "36:1"
3) 1) "$"
2) "{\"encoding\":\"utf-8\"}"
As for the first query,
127.0.0.1:6379> FT.SEARCH itemIdx '#encoding:utf-8'
1) (integer) 0
By default RediSearch escape some of the characters ,.<>{}[]"':;!##$%^&*()-+=~ to read more about it see: Tokenization
This is why the following works:
127.0.0.1:6379> FT.SEARCH itemIdx '#encoding:utf'
1) (integer) 1
2) "36:1"
3) 1) "$"
2) "{\"encoding\":\"utf-8\"}"
The simple way to work around this default escaping is to "escape it":
127.0.0.1:6379> JSON.SET 36:1 $ '{"encoding":"utf\\-8"}'
OK
127.0.0.1:6379> FT.SEARCH itemIdx '#encoding:utf\-8'
1) (integer) 1
2) "36:1"
3) 1) "$"
2) "{\"encoding\":\"utf\\\\-8\"}"
Other ways can include setting the slop between the words, or changing the default escaping.

Related

How to get document ids from FT.AGGREGATE with Redisearch?

I would like to aggregate documents and returns only document ids.
How to do it?
With aggregate you sometimes may return results which has nothing to do with document id (for example if you group by some field and then return the groups sizes). This is why the aggregate result do not contains the document id. If your specific aggregation does preserve the document id then it is possible to put it inside the document itself as a none indexed field and then return it.
For example:
127.0.0.1:6379> FT.CREATE idx SCHEMA name TEXT SORTABLE docid TAG SORTABLE NOINDEX
OK
127.0.0.1:6379> FT.ADD idx doc1 1.0 FIELDS name name1 docid doc1
OK
127.0.0.1:6379> FT.ADD idx doc2 1.0 FIELDS name name2 docid doc2
OK
127.0.0.1:6379> FT.ADD idx doc3 1.0 FIELDS name name1 docid doc3
OK
127.0.0.1:6379> FT.AGGREGATE idx * GROUPBY 1 #name REDUCE TOLIST 1 #docid as docids
1) (integer) 2
2) 1) name
2) "name2"
3) docids
4) 1) "doc2"
3) 1) name
2) "name1"
3) docids
4) 1) "doc1"
2) "doc3"
In recent versions of RediSearch, the expression LOAD {nargs} {property} allows loading the document id using #__key, which can be referenced in later stages of the aggregate pipeline:
FT.AGGREGATE idx * LOAD 1 #__key GROUPBY 1 #type REDUCE TOLIST 1 #__key as keys

Got error 'repetition-operator operand invalid' from regexp (Error #1139)

I have a column phone_number on a database that an entry may contain more than one phone number. The plan is to identify entries which do not pass a regex expression validation.
This is the query I am using to accomplish my objective:
SELECT id, phone_number FROM store WHERE phone_number NOT REGEXP '^\s*\(?(020[78]?\)? ?[1-9][0-9]{2,3} ?[0-9]{4})|(0[1-8][0-9]{3}\)? ?[1-9][0-9]{2} ?[0-9]{3})\s*$';
Problem is, every time I run the code, I get an error:
Error Code: 1139. Got error 'repetition-operator operand invalid' from regexp
Thanks in advance.
The regex you are using has at least 2 issues: 1) the escapes should be doubled, and 2) there are 2 groups separated with | that makes the ^ and $ apply to the two branches separately.
'^\s*\(?(020[78]?\)? ?[1-9][0-9]{2,3} ?[0-9]{4})|(0[1-8][0-9]{3}\)? ?[1-9][0-9]{2} ?[0-9]{3})\s*$'
^--------------------------------------^ ^------------------------------------------^
You can use
'^[[:space:]]*\\(?(020[78]?\\)? ?[1-9][0-9]{2,3} ?[0-9]{4}|0[1-8][0-9]{3}\\)? ?[1-9][0-9]{2} ?[0-9]{3})[[:space:]]*$'
Breakdown:
^ - start of string
[[:space:]]* - 0+ whitespaces
\\(? - 1 or 0 ( chars
(020[78]?\\)? ?[1-9][0-9]{2,3} ?[0-9]{4}|0[1-8][0-9]{3}\\)? ?[1-9][0-9]{2} ?[0-9]{3}) - An alternation group matching 2 alternatives:
020[78]?\\)? ?[1-9][0-9]{2,3} ?[0-9]{4} - 020 + optional 7 or 8 + an optional ) + an optional space + a digit from 1 to 9 + 3 or 2 digits + an optional space + 4 digits
| - or
0[1-8][0-9]{3}\\)? ?[1-9][0-9]{2} ?[0-9]{3} - 0 + a digit from 1 to 8 + 3 digits + an optional ) + an optional space + a digit from 1 to 9 + 2 digits + an optional space + 3 digits
[[:space:]]* - 0+ whitespaces
$ - end of string

How to make exact comparision in mysql?

Consider following query
SELECT
('foo' + 1 - 1) = 'foo',
('foo' + 1 - 1),
'foo'
to my surprise it returns 1, 0, 'foo'
so how 0 equals foo?
and how to make statement that ('foo' + 1 - 1) = 'foo' will return false (0) ?
The answer is described in Type Conversion in Expression Evaluation section of the mysql documentation.
('foo' + 1 - 1) expression is evaluated as number because of the numbers and type of operators in the expression. In this context the string 'foo' is interpreted as 0. So, the above expression translates as 0+1-1 => 0. Then, this number is compared with the string 'foo'. Since one of the operands is an integer, the other is a string, the comparison is done as floating point numbers. In this context the string 'foo' is converted to 0. 0 = 0 is true, so you get 1 as a result.
I just found one solution
SELECT
binary ('foo' + 1 - 1) = binary 'foo',
('foo' + 1 - 1),
'foo'

Cast number of bytes from blob field to number

I have a table with one blob field named bindata. bindata always contains 7 bytes. First four of them is an integer (unsigned I think, db is not mine).
My question is how can I select only the first four bytes from bindata and convert them to a number?
I am new in mySQL but from the documentation I see that I may have to use the conv function by doing something like this:
SELECT CONV(<Hex String of first 4 bytes of bindata>,16,10) as myNumber
But I don't have a clue on how to select only the first four bytes of the blob field. I am really stuck here.
Thanks
You can use string function to get partial of byte in the blob. For example:
SELECT id,
((ORD(SUBSTR(`data`, 1, 1)) << 24) +
(ORD(SUBSTR(`data`, 2, 1)) << 16) +
(ORD(SUBSTR(`data`, 3, 1)) << 8) +
ORD(SUBSTR(`data`, 4, 1))) AS num
FROM test;
Here is Demo in SQLFiddle

How do I replace NULL with zero in an SSIS expression?

I have two columns ActivityCount and ParentValue. I created an expression:
ROUND((ActivityCount / ParentValue) / 100,16) * 100
But the problem is it returns NULL for some columns and I wanted to replace NULL with 0. I can't seem to find the answers.
Expression:
ISNULL(ParentValue) || (ParentValue == 0) ? 0 : ROUND(ISNULL(ActivityCount) ? 0 : ActivityCount / ParentValue / 100,16) * 100
For readability:
ISNULL(ParentValue) || (ParentValue == 0)
? 0
: ROUND((ISNULL(ActivityCount)
? 0
: ActivityCount / ParentValue) / 100
, 16) * 100
What it does:
If ParentValue field is NULL or has Zero, you do not have to perform a calculation because you will encounter Divide by Zero error. So, simply exit the expression by returning 0.
If ActivityCount field is NULL, replace it with zero before performing the calculation.
Recommendation:
I would recommend using COALESCE on the source query to replace the NULL values with zeroes and the calculation, if possible.
Test with ISNULL (SSIS, not the SQL expression), use the conditional operator
ISNULL(ROUND((ActivityCount / ParentValue) / 100,16) * 100) ? 0 : ROUND((ActivityCount / ParentValue) / 100,16) * 100
Try like below... it will help you...
ISNULL(ROUND((ActivityCount / ParentValue) / 100,16) * 100) || (ROUND((ActivityCount / ParentValue) / 100,16) * 100)= "" ? 0 : ROUND((ActivityCount / ParentValue) / 100,16
I would have said:
ISNULL(ActivityCount) || ISNULL(ParentValue) ? 0 : ROUND((ActivityCount/ParentValue)/100,16)*100
That way you are testing the simplest cases that will cause a NULL result rather than calculating the end case multiple times in your conditional.
(That being said, any of these will technically work.)