SQL Server 2008 Containstable generate negative rank with weighted_term - sql-server-2008

I have a table with full text search enabled on Title column. I try to make a weighted search with a containstable but i get an Arithmetic overflow for the Rank value. The query is as follow
SELECT ID, CAST(Res_Tbl.RANK AS Decimal) AS Relevancy , Title
FROM table1 AS INNER JOIN
CONTAINSTABLE(table1,Title,'ISABOUT("pétoncle" weight (.8), "pétoncle" weight (.8), "PÉTONCLE" weight (.8))',LANGUAGE 1036 ) AS Res_Tbl
ON ID = Res_Tbl.[KEY]
When I execute this query I get : Arithmetic overflow error for type int, value = -83886083125.000076.
If I remove one of the two ';' in the ISABOUT function the query complete successfully.
Note you need to have some results if there is no result the query complete successfully.
Does anybody know how to solve this ?
This question is also on dba.stackexchange.com

Qualifier: Since I can't recreate this, I'm unable to know for sure if this will fix the problem. However, these are some things that I'm seeing.
First off, the ampersand, pound sign, and semicolon are word-break characters. That means, that instead of searching for the string "pétoncle", what you're actually searching for is "p", "233", and "toncle". Clearly, that's not your intent.
I have to presume that you have the text "pétoncle" somewhere in your dataset. That means you need that entire string to be complete.
There are a few things you can do.
1) Turn off Stopwords all together. You can do that by altering the full text index to turn it off.
Note that you have to have your database set to SQL Server 2008 compatability for this to not generate a syntax error:
ALTER FULLTEXT INDEX ON Table1 SET STOPLIST OFF;
2) Create a new stoplist
If you create an empty StopList, you might be able to add the stopwords that you want or copy the system stoplist and remove the stopwords that you don't want. (I would advise the second approach).
Having said that, I wasn't able to find the & or # in the system stoplist, so they may be hard coded. You may have to simply turn the stoplist off.
3) Change your search to ignore the "pétoncle" case.
If you drop the "pétoncle" from the ISABOUT and change them to "p toncle", it might work:
'ISABOUT("pétoncle" weight (.8), "p toncle" weight (.8))'
Those are just some ideas. Like I said, without being able to access the system or recreate the scenario, we won't be able to help much.
Some more information for your researching pleasure:
Stopwords and Stoplists
Alter Fulltext Index syntax
FullText search using Thesaurus file and special characters

For people who got to this page searching for negative rank results returned by SQL Server, as I did, it turns out that can happen if some of your match terms are too long (beyond some character limit). SQL Server will not actually complain or produce an error at query time, instead, the ranking will be mostly garbage, producing negative rank for some choices of weights (in my case, esp. with low weight values on the overlong terms). Limit token/word length and avoid this problem (probably a bug deep inside SQL Server 2008 fulltext search).

Related

FULLTEXT Relevance in MySQL

I'm learning to set up searches in PHP with MySQL, and I like the idea of FULLTEXT BOOLEAN searches. But there's one part I'm not really sure I understand: Relevance.
According to the manual here, when a word has no operator (plus or minus) before it, "the word is optional, but the rows that contain it are rated higher". But according to an earlier statement on that page, "They do not automatically sort rows in order of decreasing relevance".
So my question is, if they don't do it automatically, how do you manually do it? Or at least, how does one reference this "Relevance"? And if you cannot, then what is the point of them assigning values if the results are not sorted by them?
Just trying to wrap my head around this whole system of BOOLEAN MODE.

Realtime search against only 1 column in Mysql - without any plugins

However I found some threads about this, but nothing fits to my case.
I have a search field in my mobile app, where after text change, the real time search is running via calling my API.
The search request starts only if there are 3 or more characters entered and is searching ONLY in 1 DB column, called TITLE. So each time the user enters a letter, a query is searching for it.
Currently I have it like this (I know this solution is very bad). $searchedword is the word user entered:
if (!empty($searchedword)&&strlen($searchedword)>2 ) {$searchedword=strtolower($searchedword);
$sql = "SELECT * FROM TABLE ";$result = $mysqli->query($sql); $output='';
if ($result->num_rows > 0) {
while($data=$result->fetch_array()) {
$title=strtolower($data['title']);$content=$data['content'];
if (strpos($title,$searchedword) !== false ) {$output.=$title.','.$content;}
}}
So this just checks, if the title from DB contains the searched word. This works very well, but I think it is very bad according to performance, because each time the user enters a letter to the search field, each time all the data from the table are queried and looked for that word.
I want to recreate my code to meet the best performance.
So my first question is, should I add a FULLTEXT INDEX to the TITLE column in DB, will it help or will it just increase the disk space? As I am just searching against 1 column and in this column is just a title (1 or 2 words max).
And second question, what should be the best query for my case and of course with the best performance? As I need to search after each letter which user enters.
Can I use the search this way?
SELECT * FROM TABLE WHERE MATCH (title) AGAINST ('$searchedword' IN NATURAL LANGUAGE MODE)
However it seems, this will return only if the word completely matches the title, but returns nothing when the word is part of the title, so it is not a good solution.
The only solution which works is this:
SELECT * FROM TABLE WHERE title LIKE '%$searchedword%' "
but what about performance? And I don't understand how this works, because searchedword are converted to lowercase and I have removed the accents from that word, and the TITLE column in DB has accents and also Uppercase, but this search works very well!
If your title column has a collation like utfmb4_general_ci, you don't have to worry about dealing with upper case, lower case, and diacritical marks in your MySQL WHERE clauses. MySQL will do it for you. It is really good at handling character sets and collations in all kinds of languages. (Such things are very helpful to Swedish-language users, and the inventors of MySQL are Swedish.)
FULLTEXT with NATURAL LANGUAGE MODE is probably not the right approach for this application. It works on words, not chunks of letters. So it probably won't give you anything until your user has typed a whole word, and not a stop word. And, it is a little squirrely when you search a table with only a few rows. So, that might be a problem if you're just getting started.
It does order the results by the closeness of the match, so the most likely hit is the first one. So, if you know you have a phrase to search, it's good.
For your progressive-search application you may want to use one of these two LIKE queries.
SELECT title FROM tbl WHERE title LIKE CONCAT('$searchedword', '%') /*insecure*/
or this one which is much slower but finds your partial match anywhere in the title, not just at the beginning.
SELECT title FROM tbl WHERE title LIKE CONCAT('%', '$searchedword', '%') /*insecure*/
Avoid running these queries until you have gathered at least a few letters from your user, otherwise you'll get absurdly many results.
In these cases say SELECT title not SELECT *, and create an ordinary index on the title column. That way MySQL can satisfy the whole query from the index, which will make it much faster.
And, use MySQL's WHERE functionality to do the matching. Don't fetch the whole table from MySQL and search it in your php program.
And, use prepared statements. Because cybercreeps.

mysql database field type for search query

I tried searching in different terms & got some answers too but they were not matching to my requirements. like This Link
I am using a sql statement something like below to fetch matching results from MySQL table.
SELECT statements... WHERE keyword_title_field REGEXP 'abc|axy|91store';
My questions is:
What data type (e.g. varchar, text etc) should i choose for keyword_title_field field in MySQL table to fetch results quickly without putting much load on table/server.
My current data type is Text due to unknown character length supply by user. Is this best suited or should i change?
Though it's not mandatory but any reference reading along with answer would be great for my understanding.
Here are some things to consider:
When you use any field in conditions (like REGEXP, LIKE or even '=') it is importand that you put an INDEX on the field. This will make MySQL not search every record 1 by 1, but find it via its INDEX instead. So make sure to look into that -> https://www.tutorialspoint.com/mysql/mysql-indexes.htm
The less characters allowed in your field, the smaller the INDEX is. You however have variable lengths to consider, so a TEXT is fine. If you know the maximum length and it's less than 256 characters, use a VARCHAR. Just make sure to index the field.
Note that REGEXP is relatively slow. LIKE '%term%' would be prefered, but that of course depends on your needs. If it's just 'abc' OR 'axy' OR '91store', you could consider this query: SELECT statements... WHERE keyword_title_field IN ('abc', 'axy', '91store');

How can I make this SQL non sargable?

I've used an online tool to analyse one of my sql querys (The Query took me ages to make).
My query takes a word (in this example the word is 'dog.') and tries to find it in the 'qa' table when it does it joins row data from the login table where the login.pid===qa.u
SELECT login.pid,login.name,
qa.id,qa.end,qa.react,qa.win,qa.stock,qa.num,qa.ratio,qa.u,qa.t,qa.k,qa.swipes,qa.d
FROM login,qa WHERE login.pid=qa.u AND (qa.k LIKE '%dog.%' OR qa.k='.dog.')
ORDER BY qa.d DESC LIMIT 0,15
I understand what the tool is telling me:
Argument with leading wildcard
An argument has a leading wildcard character, such as "%foo". The predicate with
this argument is not sargable and cannot use an index if one exists.
but I don't know how to use an index inside the '()' without damaging or changing the results... could someone please explain how I could use an index in the middle of a query's conditions?
I take it that if this was non-sargable then the result would be faster?
First, learn to use modern join syntax:
SELECT login.pid, login.name,
qa.id, qa.end, qa.react, qa.win, qa.stock, qa.num, qa.ratio, qa.u, qa.t,qa.k, qa.swipes, qa.d
FROM login join
qa
on login.pid = qa.u
WHERE (qa.k LIKE '%dog.%' OR qa.k = '.dog.')
ORDER BY qa.d DESC
LIMIT 0,15;
Basically "sargable" means that you can use an index on a particular expression (it is not an English word, it is an acronym). The expression on qa.k cannot use an index.
This may not make a difference, depending on the query plan for the query. For instance, if the engine decides to scan the login table and then lookup values in qa, the index wouldn't help. It helps going the other way, though.
The bad news is that you cannot make this expression sargable in MySQL. The good news is that you can use a full text index to do what you want and possibly more. You can read about them here. One small note is that the default settings ignore short words, up to three letters. So you need to change the default setting if you actually want to search for "dog".
By the way, the following expression can use an index on qa.k:
WHERE (qa.k LIKE 'dog.%' OR qa.k = '.dog.')
(I'm not sure if MySQL actually would use the index, because it sometimes gets confused by or.)

MySQL Match Fulltext

Im' trying to do a fulltext search with mysql, to match a string. The problem is that it's returning odd results in the first place.
For example, the string 'passat 2.0 tdi' :
AND MATCH (
records_veiculos.titulo, records_veiculos.descricao
)
AGAINST (
'passat 2.0 tdi' WITH QUERY EXPANSION
)
is returning this as the first result (the others are fine) :
Volkswagen Passat Variant 1.9 TDI- ANO 2003
wich is incorrect, since there's no "2.0" in this example.
What could it be?
edit: Also, since this will probably be a large database (expecting up to 500.000 records), will this search method be the best for itself, or would it be better to install any other search engine like Sphinx? Or in case it doesn't, how to show relevant results?
edit2: For the record, despite the question being marked as answered, the problem with the MySQL delimiters persists, so if anyone has a suggestion on how to escape delimiters, it would be appreciated and worth the 500 points at stake. The sollution I found to increase the resultset was to replace WITH QUERY EXPANSION with IN BOOLEAN MODE, using operators to force the engine to get the words I needed, like :
AND MATCH (
records_veiculos.titulo, records_veiculos.descricao
)
AGAINST (
'+passat +2.0 +tdi' IN BOOLEAN MODE
)
It didn't solve at all, but at least the relevance of the results as changed significantly.
From the MySQL documentation on Fulltext search:
"The FULLTEXT parser determines where words start and end by looking for certain delimiter characters; for example, “ ” (space), “,” (comma), and “.” (period)."
This means that the period is delimiting the 2 and 0. So it's not looking for '2.0'; it's looking for '2' and '0', and not finding it. WITH QUERY EXPANSION is probably causing relevant related words to show up, thus obviating the need for '2' and '0' to be individual words in the result rankings. A character minimum may also be being enforced.
By default I believe mysql only indexes and matches words with 4 or more characters. You could also try escaping the period? It might be ignored this or otherwise using it as a stop character.
What is the match rank that it returns for that? Does the match have to contain all "words" my understanding was it worked like Google and only needs to match some of the words.
Having said that, have a mind to the effect of adding WITH QUERY EXPANSION, that automatically runs a second search for "related" words, which may not be what you have typed, but which the fulltext engines deems probably related.
Relevant Documentation: http://dev.mysql.com/doc/refman/5.1/en/fulltext-query-expansion.html
The "." is what's matching on 2003 in your query results.
If you're going to do searches on 3 character text strings, you should set ft_min_word_len=3
in your mysql config, restart mysql. Otherwise, a search for "tdi" will return results with "TDI-" but not with just "TDI", because rows with "TDI-" will be indexed but "TDI" alone will not.
After making that config change, you'll have to rebuild your index on that table. (Warning: your index might be significantly larger now.)