Mysql search for string and number using MATCH() AGAINST() - mysql

I have a problem with the MATCH AGAINST function.
The following query give me the same result:
SELECT * FROM models MATCH(name) AGAINST('Fiat 500')
SELECT * FROM models MATCH(name) AGAINST('Fiat')
How can I search for both strings and numbers in a column of a FULL TEXT table?
Thanks

If you need Fiat and 500 anywhere where order does not matter, then
SELECT * FROM models MATCH(name) AGAINST('+Fiat +500');
If you need Fiat 500 together, then
SELECT * FROM models MATCH(name) AGAINST('+"Fiat 500"');
If you need Fiat and zero or more 500, then
SELECT * FROM models MATCH(name) AGAINST('+Fiat 500');
If you need 500 and zero or more Fiat, then
SELECT * FROM models MATCH(name) AGAINST('Fiat +500');
Give it a Try !!!
UPDATE 2013-01-28 18:28 EDT
Here are the default settings for FULLTEXT searching
mysql> show variables like 'ft%';
+--------------------------+----------------+
| Variable_name | Value |
+--------------------------+----------------+
| ft_boolean_syntax | + -><()~*:""&| |
| ft_max_word_len | 84 |
| ft_min_word_len | 4 |
| ft_query_expansion_limit | 20 |
| ft_stopword_file | (built-in) |
+--------------------------+----------------+
5 rows in set (0.00 sec)
mysql>
Notice that ft_min_word_len is 4 by default. The token 500 is length 3. thus it will not be indexed at all. You will have to do three(3) things:
STEP 01 : Configure for smaller string tokens
Add this to /etc/my.cnf
[mysqld]
ft_min_word_len = 1
STEP 02 : Restart mysql
service mysql restart
STEP 03 : Reindex all indexes in the models table
You could just drop and add the FULLTEXT index
or do it in stages and see how big it will get in advance
CREATE TABLE models_new LIKE models;
ALTER TABLE models_new DROP INDEX name;
ALTER TABLE models_new ADD FULLTEXT name (name);
ALTER TABLE models_new DISABLE KEYS;
INSERT INTO models_new SELECT * FROM models;
ALTER TABLE models_new ENABLE KEYS;
ALTER TABLE models RENAME models_old;
ALTER TABLE models_new RENAME models;
When you are satisfied this worked, then run
DROP TABLE models_old;
Give it a Try !!!

Just in case it helps others, if you're using InnoDB you need to use a different set of mysql.cnf properties
eg
show variables like 'innodb_ft%';
and in my.cnf set the following value instead of ft_min_word_len
innodb_ft_min_token_size=1

Related

Process TEXT BLOBs fields in MySQL line by line

I have a MEDIUMTEXT blob in a table, which contains paths, separated by new line characters. I'd like to add a "/" to the begging of each line if it is not already there. Is there a way to write a query to do this with built-in procedures?
I suppose an alternative would be to write a Python script to get the field, convert to a List, process each line and update the record. There aren't that many records in the DB, so I can take the processing delay (if it doesn't lock the entire DB or table). About 8K+ rows.
Either way would be fine. If second option is recommended, do I need to know of specific locking schematics before getting into this -- as this would be run on a live prod DB (of course, I'd take a DB snapshot). But in place updates would be best to not have downtime.
Demo:
mysql> create table mytable (id int primary key, t text );
mysql> insert into mytable values (1, 'path1\npath2\npath3');
mysql> select * from mytable;
+----+-------------------+
| id | t |
+----+-------------------+
| 1 | path1
path2
path3 |
+----+-------------------+
1 row in set (0.00 sec)
mysql> update mytable set t = concat('/', replace(t, '\n', '\n/'));
mysql> select * from mytable;
+----+----------------------+
| id | t |
+----+----------------------+
| 1 | /path1
/path2
/path3 |
+----+----------------------+
However, I would strongly recommend to store each path on its own row, so you don't have to think about this. In SQL, each column should store one value per row, not a set of values.

FullText Search Innodb Fails, MyIsam Returns Results

I've upgraded a table from myisam to innodb but am not having the same performance. The innodb returns a 0 score when there should be some relation. The myisam table returns a match for the same term (I kept a copy of the old table so I can still run the same query).
SELECT MATCH (COLUMNS) AGAINST ('+"Term Ex"' IN BOOLEAN MODE) as score
FROM table_myisam
where id = 1;
Returns:
+-------+
| score |
+-------+
| 1 |
+-------+
but:
SELECT MATCH (COLUMNS) AGAINST ('+"Term Ex"' IN BOOLEAN MODE) as score
FROM table
where id = 1;
returns:
+-------+
| score |
+-------+
| 0 |
+-------+
I thought the ex might not have been indexed because innodb_ft_min_token_size was set to 3. I lowered that to 1 and optimized the table but that had no affect. The column contents are 99 characters long so I presumed the whole column wasn't indexed because of innodb_ft_max_token_size. I increased that as well to 150 and ran the optimize again but again had the same result.
The only difference between these tables is the engine and the character set. This table is using utf8, the myisam table is using latin1.
Has anyone seen these behavior, or have advice for how to resolve it?
UPDATE:
I added ft_stopword_file="" to my my.cnf and ran OPTIMIZE TABLE table again. This time I got
optimize | note | Table does not support optimize, doing recreate + analyze instead
The query worked after this change. Ex is not a stop word though so not sure why it would make a difference.
A new query that fails though is:
SELECT MATCH (Columns) AGAINST ('+Term +Ex +in' IN BOOLEAN MODE) as score FROM Table where id = 1;
+-------+
| score |
+-------+
| 0 |
+-------+
the in causes this to fail but that is the next word in my table.
SELECT MATCH (Columns) AGAINST ('+Term +Ex' IN BOOLEAN MODE) as score FROM Table where id = 1;
+--------------------+
| score |
+--------------------+
| 219.30206298828125 |
+--------------------+
I also tried CREATE TABLE my_stopwords(value VARCHAR(30)) ENGINE = INNODB;, then updated my.cnf with innodb_ft_server_stopword_table='db/my_stopwords'. I restarted and ran:
show variables like 'innodb_ft_server_stopword_table';
which brought back:
+---------------------------------+---------------------------+
| Variable_name | Value |
+---------------------------------+---------------------------+
| innodb_ft_server_stopword_table | 'db/my_stopwords'; |
+---------------------------------+---------------------------+
so I thought the in would not cause the query to fail now but it continues. I also tried OPTIMIZE TABLE table again and even ALTER TABLE table DROP INDEX ... and ALTER TABLE table ADD FULLTEXT KEY ... none of which have had an affect.
Second Update
The issue is with the stop words.
$userinput = preg_replace('/\b(a|about|an|are|as|at|be|by|com|de|en|for|from|how|i|in|is|it|la|of|on|or|that|the|this|to|was|what|when|where|who|will|with|und|the|www)\b/', '', $userinput);
resolves the issue but that doesn't appear as a good solution to me. I'd like a solution that avoids the stop words breaking this in mysql.
Stopword table data:
CREATE TABLE `my_stopwords` (
`value` varchar(30) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
and
Name: my_stopwords
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 0
Avg_row_length: 0
Data_length: 16384
Max_data_length: 0
Index_length: 0
Data_free: 0
Auto_increment: NULL
Create_time: 2019-04-09 17:39:55
Update_time: NULL
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
There are several differences between MyISAM's FULLTEXT and InnoDB's. I think you were caught by the handling of 'short' words and/or stop words. MyISAM will show rows, but InnoDB will fail to.
What I have done when using FT (and after switching to InnoDB) is to filter the user's input to avoid short words. It takes extra effort but gets me the rows desired. My case is slightly different since the resulting query is something like this. Note that I have added + to require the words, but not on words shorter than 3 (my ft_min_token_size is 3). These searches were for build a table and build the table:
WHERE match(description) AGAINST('+build* a +table*' IN BOOLEAN MODE)
WHERE match(description) AGAINST('+build* +the* +table*' IN BOOLEAN MODE)
(The trailing * may be redundant; I have not investigated that.)
Another approach
Since FT is very efficient at non-short, non-stop words, do the search with two phases, each being optional: To search for "a long word", do
WHERE MATCH(d) AGAINST ('+long +word' IN BOOLEAN MODE)
AND d REGEXP '[[:<:]]a[[:>:]]'
The first part whittles down the possible rows rapidly by looking for 'long' and 'word' (as words). The second part makes sure there is a word a in the string, too. The REGEXP is costly but will be applied only to those rows that pass the first test.
To search just for "long word":
WHERE MATCH(d) AGAINST ('+long +word' IN BOOLEAN MODE)
To search just for the word "a":
WHERE d REGEXP '[[:<:]]a[[:>:]]'
Caveat: This case will be slow.
Note: My examples allow for the words to be in any order, and in any location in the string. That is, this string will match in all my examples: "She was longing for a word from him."
Here is a step by step procedure which should have reproduced your problem. (This is actually how you should have written your question.) The environment is a freshly installed VM with Debian 9.8 and Percona Server Ver 5.6.43-84.3.
Create an InnoDB table with a fulltext index and some dummy data:
create table test.ft_innodb (
txt text,
fulltext index (txt)
) engine=innodb charset=utf8 collate=utf8_unicode_ci;
insert into test.ft_innodb (txt) values
('Some dummy text'),
('Text with a long and short stop words in it ex');
Execute a test query to verify that it doesn't work yet as we need:
select txt
, match(t.txt) against ('+some' in boolean mode) as score0
, match(t.txt) against ('+with' in boolean mode) as score1
, match(t.txt) against ('+in' in boolean mode) as score2
, match(t.txt) against ('+ex' in boolean mode) as score3
from test.ft_innodb t;
Result (rounded):
txt | score0 | score1 | score2 | score3
-----------------------------------------------|--------|--------|--------|-------
Some dummy text | 0.0906 | 0 | 0 | 0
Text with a long and short stop words in it ex | 0 | 0 | 0 | 0
As you see, it's not working with stop words ("+with") or with short words ("+ex").
Create an empty InnoDB table for custom stop words:
create table test.my_stopwords (value varchar(30)) engine=innodb;
Edit /etc/mysql/my.cnf and add the following two lines in the [mysqld] block:
[mysqld]
# other settings
innodb_ft_server_stopword_table = "test/my_stopwords"
innodb_ft_min_token_size = 1
Restart MySQL with service mysql restart
Run the query from (2.) again (The result should be the same)
Rebuild the fulltext index with
optimize table test.ft_innodb;
It will actually rebuild the entire tabe including all indexes.
Execute the test query from (2.) again. Now the result is:
txt | score1 | score1 | score2 | score3
-----------------------------------------------|--------|--------|--------|-------
Some dummy text | 0.0906 | 0 | 0 | 0
Text with a long and short stop words in it ex | 0 | 0.0906 | 0.0906 | 0.0906
You see it works just fine for me. And it's quite simple to reproduce. (Again - This is how you should have written your question.)
Since your procedure is rather chaotic than detailed, it's difficult to say what could go wrong for you. For example:
CREATE TABLE my_stopwords(value VARCHAR(30)) ENGINE = INNODB;
This doesn't contain the information, in which database you have defined that table. Note that I have prefixed all my tables with the corresponding database. Now consider the following: I change my.cnf and set innodb_ft_server_stopword_table = "db/my_stopwords". Note - There is no such table on my server (not even the schema db exists). Restart the MySQL server. And check the new settings with
show variables like 'innodb_ft_server_stopword_table';
This returns:
Variable_name | Value
--------------------------------|----------------
innodb_ft_server_stopword_table | db/my_stopwords
And after optimize table test.ft_innodb; the test query returns this:
txt | score0 | score1 | score2 | score3
-----------------------------------------------|--------|--------|--------|-------
Some dummy text | 0.0906 | 0 | 0 | 0
Text with a long and short stop words in it ex | 0 | 0 | 0 | 0.0906
You see? It's not working with stopwords any more. But it works with short non stop words like "+ex". So make sure, that the table you defined in innodb_ft_server_stopword_table actually exists.
A common technique in searching is to make an extra column with the 'sanitized' string to search in. Then add the FULLTEXT index to that column instead of the original column.
In your case, removing the stopwords is the main difference. But there may also be punctuation that could (should?) be removed. Sometimes hyphenated words or words or contractions or part numbers or model numbers cause trouble. They can be modified to change the punctuation or spacing to make it more friendly with the FT requirements and/or the user's flavor of input. Another thing is to add words to the search-string column that are common misspellings of the words that are in the column.
Sure, this is more work than you would like to have to do. But I think it provides a viable solution.

Full Text Search not working with numbers

I have this table with song titles, artist etc.
| id | artist | title | search_tags |
| 1 | miley cyrus | 23 | miley cyrus 23 |
This is my query:
select * from music where match(search_tags) against ('+$search_value*' IN BOOLEAN MODE)
It works fine when I search: miley but it doesn't show any results when search_tags = 23
Note: I'm using MySQL 5.6 with InnoDB on Windows 10 and ft_min_word_len=1
When searching for "23" will not work because the length of the key is small.
MySQL by default stored keys in the fulltext index with a min of 4 characters. you will need to change that to 1 or 2 for your query to work.
What you need to do here is
Add this line to your my.ini file
innodb_ft_min_token_size = 1
Restart MySQL Service
After the server comes back up, rebuild your tables by issuing a fake ALTER command.
ALTER TABLE table_name ENGINE=INNODB;
Run the query again and it should work :)
Good Luck
Notice that ft_min_word_len is 4 by default. The token 500 is length 3. thus it will not be indexed at all. You will have to do three(3) things:
STEP 01 : Open the init file inside mysql folder
Add this to \bin\mysql\mysql5.7.24
innodb_ft_min_token_size = 1
STEP 02 : Restart mysql
STEP 03 : Reindex all indexes in the models table
You could just drop and add the FULLTEXT index

mysql ft_min_word_len change on ubuntu does not work

I am trying to implement the full test search. And for this the first thing I did change the value of ft_min_word_len = 2 on /etc/mysql/my.cnf as
[mysqld]
#
# * Basic Settings
#
#
# * IMPORTANT
# If you make changes to these settings and your system uses apparmor, you may
# also need to also adjust /etc/apparmor.d/usr.sbin.mysqld.
#
ft_min_word_len = 2
Now Saved that and restarted the server.
I am aware that if I already have an index with FULLTEXT in a table I will need to drop the indexes and rebuilt, or repair the table.
But I have created the table as
create table
`comments`
( `id` int(11),
`comment` varchar(200),
`iduser` int(11) ,
`date_added` datetime
)
ENGINE=MyISAM;
ALTER TABLE comments
ADD FULLTEXT INDEX comment_index
(comment);
Then in the above table I have some comments added manually.
When I try to search something as
SELECT * FROM comments where MATCH (comment) AGAINST ('the') ; // "the" is very common word of length to see my test result
It returns 0 rows.
However if I set AGAINST with a word length of 4 it works.
I tried to check the ft_ variables as
mysql> show variables like 'ft_%';
+--------------------------+----------------+
| Variable_name | Value |
+--------------------------+----------------+
| ft_boolean_syntax | + -><()~*:""&| |
| ft_max_word_len | 84 |
| ft_min_word_len | 2 |
| ft_query_expansion_limit | 20 |
| ft_stopword_file | (built-in) |
+--------------------------+----------------+
Interesting thing is in /etc/mysql/my.cnf I can only see ft_min_word_len but the ft_max_word_len is not there and more importantly the search less than length 4 does not work atall.
This is making me crazy and not sure if there is some other config which is over writing everything and seems like not able locate them either.
Any help would be appreciated.
Mysql Version in my development machine is
mysql Ver 14.14 Distrib 5.1.63, for debian-linux-gnu (i686) using readline 6.2
I was able to found the issue and the fix.
The full text search setting was good since I wanted to search with alt east 2 words and I had ft_min_word_len = 2
Now while doing more testing I found it randomly does search for few 2 character words and ignores other.
Here is an example
CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
title VARCHAR(200),
body TEXT,
FULLTEXT (title,body)
);
INSERT INTO articles (title,body) VALUES
('MySQL Tutorial','DBMS stands for DataBase ...'),
('How To Use MySQL Well','After you went through a ...'),
('Optimizing MySQL','In this tutorial we will show ...'),
('1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'),
('MySQL vs. YourSQL','In the following database comparison ...'),
('MySQL Security','When configured properly, MySQL ...');
mysql> SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('use');
Empty set (0.00 sec)
mysql> SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('Use');
Empty set (0.00 sec)
mysql> SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('you');
Empty set (0.00 sec)
mysql> SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('the');
Empty set (0.00 sec)
mysql> SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('vs.');
Empty set (0.00 sec)
But the following works
mysql> SELECT * FROM articles
-> WHERE MATCH (title,body) AGAINST ('run');
+----+-------------------+-------------------------------------+
| id | title | body |
+----+-------------------+-------------------------------------+
| 4 | 1001 MySQL Tricks | 1. Never run mysqld as root. 2. ... |
+----+-------------------+-------------------------------------+
So its something else and apparently found its the ft_stopword_file which has a list of words and does not do anything if any search happens with one of them.
The list is here http://dev.mysql.com/doc/refman/5.1/en/fulltext-stopwords.html
So in this case to allow any word search of length at least 2 character long
Set the ft_min_word_len to 2
Then in the mysql config file , for debian /etc/mysql/my.cnf add ft_stopword_file='path/to/stopword_file.txt'
We can leave this file blank if needed.
Oh one more thing once we do the above settings we need to restart mysql and if we change ft_min_word_len then we need to re-built the index or repair the table.

MySQL Update Field with some prefix

i have table have prefixed with bok- and inv-
id | number
1 | bok-1
2 | inv-3
3 | bok-2
4 | inv-2
5 | inv-10
6 | bok-3
How can it sorted the field number prefixed with inv-?
Which in this case the result will be:
id | number
1 | bok-1
2 | inv-1
3 | bok-2
4 | inv-2
5 | inv-3
6 | bok-3
You could just use MySQL's SUBSTRING() function:
ORDER BY CAST(SUBSTRING(number, 5) AS SIGNED)
See it on sqlfiddle.
However, it would probably be better to store the prefix and integer parts in separate columns, if at all possible:
ALTER TABLE mytable
ADD COLUMN prefix ENUM('bok', 'inv'),
ADD COLUMN suffix INT;
UPDATE mytable SET
prefix = LEFT(number, 3),
suffix = SUBSTRING(number, 5);
ALTER TABLE mytable
DROP COLUMN number;
Basically you should redesign your database structure. Unfortunately no other options possible processing this efficiently since the database won't index on those dashes. So separate both in 2 fields is the most common practice. Otherwise you will run table scans on every order by clause.
Edit: In addition to the information from the discussion you had: https://chat.stackoverflow.com/rooms/13241/discussion-between-eggyal-and-gusdecool it is clear that this is a wrong design and the operation you are asking for should not be executed at all.
It would be both impossible to realize it without created a decent structure and to create a solution this way which would be legally ok.