I've upgraded a table from myisam to innodb but am not having the same performance. The innodb returns a 0 score when there should be some relation. The myisam table returns a match for the same term (I kept a copy of the old table so I can still run the same query).
SELECT MATCH (COLUMNS) AGAINST ('+"Term Ex"' IN BOOLEAN MODE) as score
FROM table_myisam
where id = 1;
Returns:
+-------+
| score |
+-------+
| 1 |
+-------+
but:
SELECT MATCH (COLUMNS) AGAINST ('+"Term Ex"' IN BOOLEAN MODE) as score
FROM table
where id = 1;
returns:
+-------+
| score |
+-------+
| 0 |
+-------+
I thought the ex might not have been indexed because innodb_ft_min_token_size was set to 3. I lowered that to 1 and optimized the table but that had no affect. The column contents are 99 characters long so I presumed the whole column wasn't indexed because of innodb_ft_max_token_size. I increased that as well to 150 and ran the optimize again but again had the same result.
The only difference between these tables is the engine and the character set. This table is using utf8, the myisam table is using latin1.
Has anyone seen these behavior, or have advice for how to resolve it?
UPDATE:
I added ft_stopword_file="" to my my.cnf and ran OPTIMIZE TABLE table again. This time I got
optimize | note | Table does not support optimize, doing recreate + analyze instead
The query worked after this change. Ex is not a stop word though so not sure why it would make a difference.
A new query that fails though is:
SELECT MATCH (Columns) AGAINST ('+Term +Ex +in' IN BOOLEAN MODE) as score FROM Table where id = 1;
+-------+
| score |
+-------+
| 0 |
+-------+
the in causes this to fail but that is the next word in my table.
SELECT MATCH (Columns) AGAINST ('+Term +Ex' IN BOOLEAN MODE) as score FROM Table where id = 1;
+--------------------+
| score |
+--------------------+
| 219.30206298828125 |
+--------------------+
I also tried CREATE TABLE my_stopwords(value VARCHAR(30)) ENGINE = INNODB;, then updated my.cnf with innodb_ft_server_stopword_table='db/my_stopwords'. I restarted and ran:
show variables like 'innodb_ft_server_stopword_table';
which brought back:
+---------------------------------+---------------------------+
| Variable_name | Value |
+---------------------------------+---------------------------+
| innodb_ft_server_stopword_table | 'db/my_stopwords'; |
+---------------------------------+---------------------------+
so I thought the in would not cause the query to fail now but it continues. I also tried OPTIMIZE TABLE table again and even ALTER TABLE table DROP INDEX ... and ALTER TABLE table ADD FULLTEXT KEY ... none of which have had an affect.
Second Update
The issue is with the stop words.
$userinput = preg_replace('/\b(a|about|an|are|as|at|be|by|com|de|en|for|from|how|i|in|is|it|la|of|on|or|that|the|this|to|was|what|when|where|who|will|with|und|the|www)\b/', '', $userinput);
resolves the issue but that doesn't appear as a good solution to me. I'd like a solution that avoids the stop words breaking this in mysql.
Stopword table data:
CREATE TABLE `my_stopwords` (
`value` varchar(30) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
and
Name: my_stopwords
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 0
Avg_row_length: 0
Data_length: 16384
Max_data_length: 0
Index_length: 0
Data_free: 0
Auto_increment: NULL
Create_time: 2019-04-09 17:39:55
Update_time: NULL
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
There are several differences between MyISAM's FULLTEXT and InnoDB's. I think you were caught by the handling of 'short' words and/or stop words. MyISAM will show rows, but InnoDB will fail to.
What I have done when using FT (and after switching to InnoDB) is to filter the user's input to avoid short words. It takes extra effort but gets me the rows desired. My case is slightly different since the resulting query is something like this. Note that I have added + to require the words, but not on words shorter than 3 (my ft_min_token_size is 3). These searches were for build a table and build the table:
WHERE match(description) AGAINST('+build* a +table*' IN BOOLEAN MODE)
WHERE match(description) AGAINST('+build* +the* +table*' IN BOOLEAN MODE)
(The trailing * may be redundant; I have not investigated that.)
Another approach
Since FT is very efficient at non-short, non-stop words, do the search with two phases, each being optional: To search for "a long word", do
WHERE MATCH(d) AGAINST ('+long +word' IN BOOLEAN MODE)
AND d REGEXP '[[:<:]]a[[:>:]]'
The first part whittles down the possible rows rapidly by looking for 'long' and 'word' (as words). The second part makes sure there is a word a in the string, too. The REGEXP is costly but will be applied only to those rows that pass the first test.
To search just for "long word":
WHERE MATCH(d) AGAINST ('+long +word' IN BOOLEAN MODE)
To search just for the word "a":
WHERE d REGEXP '[[:<:]]a[[:>:]]'
Caveat: This case will be slow.
Note: My examples allow for the words to be in any order, and in any location in the string. That is, this string will match in all my examples: "She was longing for a word from him."
Here is a step by step procedure which should have reproduced your problem. (This is actually how you should have written your question.) The environment is a freshly installed VM with Debian 9.8 and Percona Server Ver 5.6.43-84.3.
Create an InnoDB table with a fulltext index and some dummy data:
create table test.ft_innodb (
txt text,
fulltext index (txt)
) engine=innodb charset=utf8 collate=utf8_unicode_ci;
insert into test.ft_innodb (txt) values
('Some dummy text'),
('Text with a long and short stop words in it ex');
Execute a test query to verify that it doesn't work yet as we need:
select txt
, match(t.txt) against ('+some' in boolean mode) as score0
, match(t.txt) against ('+with' in boolean mode) as score1
, match(t.txt) against ('+in' in boolean mode) as score2
, match(t.txt) against ('+ex' in boolean mode) as score3
from test.ft_innodb t;
Result (rounded):
txt | score0 | score1 | score2 | score3
-----------------------------------------------|--------|--------|--------|-------
Some dummy text | 0.0906 | 0 | 0 | 0
Text with a long and short stop words in it ex | 0 | 0 | 0 | 0
As you see, it's not working with stop words ("+with") or with short words ("+ex").
Create an empty InnoDB table for custom stop words:
create table test.my_stopwords (value varchar(30)) engine=innodb;
Edit /etc/mysql/my.cnf and add the following two lines in the [mysqld] block:
[mysqld]
# other settings
innodb_ft_server_stopword_table = "test/my_stopwords"
innodb_ft_min_token_size = 1
Restart MySQL with service mysql restart
Run the query from (2.) again (The result should be the same)
Rebuild the fulltext index with
optimize table test.ft_innodb;
It will actually rebuild the entire tabe including all indexes.
Execute the test query from (2.) again. Now the result is:
txt | score1 | score1 | score2 | score3
-----------------------------------------------|--------|--------|--------|-------
Some dummy text | 0.0906 | 0 | 0 | 0
Text with a long and short stop words in it ex | 0 | 0.0906 | 0.0906 | 0.0906
You see it works just fine for me. And it's quite simple to reproduce. (Again - This is how you should have written your question.)
Since your procedure is rather chaotic than detailed, it's difficult to say what could go wrong for you. For example:
CREATE TABLE my_stopwords(value VARCHAR(30)) ENGINE = INNODB;
This doesn't contain the information, in which database you have defined that table. Note that I have prefixed all my tables with the corresponding database. Now consider the following: I change my.cnf and set innodb_ft_server_stopword_table = "db/my_stopwords". Note - There is no such table on my server (not even the schema db exists). Restart the MySQL server. And check the new settings with
show variables like 'innodb_ft_server_stopword_table';
This returns:
Variable_name | Value
--------------------------------|----------------
innodb_ft_server_stopword_table | db/my_stopwords
And after optimize table test.ft_innodb; the test query returns this:
txt | score0 | score1 | score2 | score3
-----------------------------------------------|--------|--------|--------|-------
Some dummy text | 0.0906 | 0 | 0 | 0
Text with a long and short stop words in it ex | 0 | 0 | 0 | 0.0906
You see? It's not working with stopwords any more. But it works with short non stop words like "+ex". So make sure, that the table you defined in innodb_ft_server_stopword_table actually exists.
A common technique in searching is to make an extra column with the 'sanitized' string to search in. Then add the FULLTEXT index to that column instead of the original column.
In your case, removing the stopwords is the main difference. But there may also be punctuation that could (should?) be removed. Sometimes hyphenated words or words or contractions or part numbers or model numbers cause trouble. They can be modified to change the punctuation or spacing to make it more friendly with the FT requirements and/or the user's flavor of input. Another thing is to add words to the search-string column that are common misspellings of the words that are in the column.
Sure, this is more work than you would like to have to do. But I think it provides a viable solution.
Related
I'm trying to query using mysql FULLTEXT, but unfortunately its returning empty results even the table contain those input keyword.
Table: user_skills:
+----+----------------------------------------------+
| id | skills |
+----+----------------------------------------------+
| 1 | Dance Performer,DJ,Entertainer,Event Planner |
| 2 | Animation,Camera Operator,Film Direction |
| 3 | DJ |
| 4 | Draftsman |
| 5 | Makeup Artist |
| 6 | DJ,Music Producer |
+----+----------------------------------------------+
Indexes:
Query:
SELECT id,skills
FROM user_skills
WHERE ( MATCH (skills) AGAINST ('+dj' IN BOOLEAN MODE))
Here once I run the above query none of the DJ rows are returning. In the table there are 3 rows with is having the value dj.
A full text index is the wrong approach for what you are trying to do. But, your specific issue is the minimum word length, which is either 3 or 4 (by default), depending on the ending. This is explained in the documentation, specifically here.
Once you reset the value, you will need to recreate the index.
I suspect you are trying to be clever. You have probably heard the advice "don't store lists of things in delimited strings". But you instead countered "ah, but I can use a full text index". You can, although you will find that more complex queries do not optimize very well.
Just do it right. Create the association table user_skills with one row per user and per skill that the user has. You will find it easier to use in queries, to prevent duplicates, to optimize queries, and so on.
Your search term is to short
as in mysql doc
Some words are ignored in full-text searches:
Any word that is too short is ignored. The default minimum length of words that are found by full-text searches is three characters for
InnoDB search indexes, or four characters for MyISAM. You can control
the cutoff by setting a configuration option before creating the
index: innodb_ft_min_token_size configuration option for InnoDB search
indexes, or ft_min_word_len for MyISAM.
.
Boolean full-text searches have these characteristics:
They do not use the 50% threshold.
They do not automatically sort rows in order of decreasing relevance.
You can see this from the preceding query result: The row with the
highest relevance is the one that contains “MySQL” twice, but it is
listed last, not first.
They can work even without a FULLTEXT index, although a search
executed in this fashion would be quite slow.
The minimum and maximum word length full-text parameters apply.
https://dev.mysql.com/doc/refman/5.6/en/fulltext-natural-language.html
https://dev.mysql.com/doc/refman/5.6/en/fulltext-boolean.html
We need to create the index on "source path" column, which is already in MUL - Key. For Example it have /src/com/Vendor/DTP/Emp/Grd1/Sal/2016/Jan/31-01/Joseph and we need to search like '%Sal/2016/Jan%' it have almost 10 Million records.
Please suggest any idea for performance improvement.
| Field | Type | Null | Key | Default | Extra |
+------------+----------+------+-----+---------+----------------+
| Id | int(11) | NO | PRI | NULL | auto_increment |
| Name | char(35) | NO | | | |
| Country | char(3) | NO | UNI | | |
| source Path| char(20) | YES | MUL | | |
| Population | int(11) | NO | | 0 |
Unfortunately, a search that starts with % cannot use an index (it has not much to do with being in a composite index).
You have some options though:
The values in your path seem to have actual meaning. The ideal solution would be to use the meta-data, e.g. the month, name, whatever "SAL" stands for, and store it in their own columns or an attribute table, and then query for that meta-data instead. This is obviously only possible in very specific cases where you have the required meta-data for every path, so it is probably not an option here.
You can add a "search table" (e.g. (id, subpath)) that contains all subpaths of your source path, e.g.
'/src/com/Vendor/DTP/Emp/Grd1/Sal/2016/Jan/31-01/Joseph'
'/com/Vendor/DTP/Emp/Grd1/Sal/2016/Jan/31-01/Joseph'
'/Vendor/DTP/Emp/Grd1/Sal/2016/Jan/31-01/Joseph'
...
'/Sal/2016/Jan/31-01/Joseph'
...
'/31-01/Joseph'
'/Joseph'
so 11 rows in your example. It's now possible to use an index on that, e.g. in
...
where exists
(select * from subpaths s
where s.subpath like '/Sal/2016/Jan%' and s.id = outerquery.id)
This relies on knowing the start of your search term. If Sal in your example %Sal/2016/Jan should actually include word endings, e.g. /NoSal/2016/Jan, you would have to modify your input term to remove the first word, so %Sal/2016/Jan% would require you to search for /2016/Jan% (with an index) and then recheck the resultset afterwards if it also fits %Sal/2016/Jan% (see the fulltext option for an example, it has the same "problem" to only look for the beginning of words).
You will have to maintain the search table, which is usually done in a trigger (update the subpath table when you insert, update or delete values in your original table).
Since this is a new table, you cannot combine it (directly) with another index, to e.g. optimize where country = 'A' and subpath like 'Sal/2016/Jan%' if country = 'A' would already get rid of 99.99% of the rows. You may have to check explain for your query if MySQL actually uses the index (because the optimizer can try something different) and then maybe reorganize your query (e.g. use a join or force index).
You can use a fulltext search. From the userinput, you would have to generate a query like
select * from
(select * from table
where match(`source Path`) against ('+SAL +2016 +Jan' in boolean mode)) subquery
where `source path` like '%Sal/2016/Jan%'
The fulltext search will not care about the order of the words, so you have to recheck the resultset if it actually is the correct path, but the fulltext search will use the (fulltext) index to speed it up. It will only look for the beginning of words, so similar to the "search table" option, if Sal can be the end of the word, you have to remove it from the fulltext search. By default, only words with at least 3 or 4 letters (depending on your engine) will be added to the index, so you have to set the value of either ft_min_word_len or innodb_ft_min_token_size to whatever fits your requirements.
The search table approach is probably the most convenient solution, as it can be used very similar to your current search: you can add the userinput directly in one place (without having to interpret it to create the against (...) expression) and you can also use it easily in other situations (e.g. in something like join table2 on concat(table2.Year,'/',table2.Month,'%') like ...); but you will have to set up the triggers (or however else you maintain the table), which is a little more complicated than just adding a fulltext index.
I have this table with song titles, artist etc.
| id | artist | title | search_tags |
| 1 | miley cyrus | 23 | miley cyrus 23 |
This is my query:
select * from music where match(search_tags) against ('+$search_value*' IN BOOLEAN MODE)
It works fine when I search: miley but it doesn't show any results when search_tags = 23
Note: I'm using MySQL 5.6 with InnoDB on Windows 10 and ft_min_word_len=1
When searching for "23" will not work because the length of the key is small.
MySQL by default stored keys in the fulltext index with a min of 4 characters. you will need to change that to 1 or 2 for your query to work.
What you need to do here is
Add this line to your my.ini file
innodb_ft_min_token_size = 1
Restart MySQL Service
After the server comes back up, rebuild your tables by issuing a fake ALTER command.
ALTER TABLE table_name ENGINE=INNODB;
Run the query again and it should work :)
Good Luck
Notice that ft_min_word_len is 4 by default. The token 500 is length 3. thus it will not be indexed at all. You will have to do three(3) things:
STEP 01 : Open the init file inside mysql folder
Add this to \bin\mysql\mysql5.7.24
innodb_ft_min_token_size = 1
STEP 02 : Restart mysql
STEP 03 : Reindex all indexes in the models table
You could just drop and add the FULLTEXT index
For example - I create database and a table from cli and insert some data:
CREATE DATABASE testdb CHARACTER SET 'utf8' COLLATE 'utf8_general_ci';
USE testdb;
CREATE TABLE test (id INT, str VARCHAR(100)) TYPE=innodb CHARACTER SET 'utf8' COLLATE 'utf8_general_ci';
INSERT INTO test VALUES (9, 'some string');
Now I can do this and these examples do work (so - quotes don't affect anything it seems):
SELECT * FROM test WHERE id = '9';
INSERT INTO test VALUES ('11', 'some string');
So - in these examples I've selected a row by a string that actually stored as INT in mysql and then I inserted a string in a column that is INT.
I don't quite get why this works the way it works here. Why is string allowed to be inserted in an INT column?
Can I insert all MySQL data types as strings?
Is this behavior standard across different RDBMS?
MySQL is a lot like PHP, and will auto-convert data types as best it can. Since you're working with an int field (left-hand side), it'll try to transparently convert the right-hand-side of the argument into an int as well, so '9' just becomes 9.
Strictly speaking, the quotes are unnecessary, and force MySQL to do a typecasting/conversion, so it wastes a bit of CPU time. In practice, unless you're running a Google-sized operation, such conversion overhead is going to be microscopically small.
You should never put quotes around numbers. There is a valid reason for this.
The real issue comes down to type casting. When you put numbers inside quotes, it is treated as a string and MySQL must convert it to a number before it can execute the query. While this may take a small amount of time, the real problems start to occur when MySQL doesn't do a good job of converting your string. For example, MySQL will convert basic strings like '123' to the integer 123, but will convert some larger numbers, like '18015376320243459', to floating point. Since floating point can be rounded, your queries may return inconsistent results. Learn more about type casting here. Depending on your server hardware and software, these results will vary. MySQL explains this.
If you are worried about SQL injections, always check the value first and use PHP to strip out any non numbers. You can use preg_replace for this: preg_replace("/[^0-9]/", "", $string)
In addition, if you write your SQL queries with quotes they will not work on databases like PostgreSQL or Oracle.
Check this, you can understand better ...
mysql> EXPLAIN SELECT COUNT(1) FROM test_no WHERE varchar_num=0000194701461220130201115347;
+----+-------------+------------------------+-------+-------------------+-------------------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------------+-------+-------------------+-------------------+---------+------+---------+--------------------------+
| 1 | SIMPLE | test_no | index | Uniq_idx_varchar_num | Uniq_idx_varchar_num | 63 | NULL | 3126240 | Using where; Using index |
+----+-------------+------------------------+-------+-------------------+-------------------+---------+------+---------+--------------------------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT COUNT(1) FROM test_no WHERE varchar_num='0000194701461220130201115347';
+----+-------------+------------------------+-------+-------------------+-------------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------------+-------+-------------------+-------------------+---------+-------+------+-------------+
| 1 | SIMPLE | test_no | const | Uniq_idx_varchar_num | Uniq_idx_varchar_num | 63 | const | 1 | Using index |
+----+-------------+------------------------+-------+-------------------+-------------------+---------+-------+------+-------------+
1 row in set (0.00 sec)
mysql>
mysql>
mysql> SELECT COUNT(1) FROM test_no WHERE varchar_num=0000194701461220130201115347;
+----------+
| COUNT(1) |
+----------+
| 1 |
+----------+
1 row in set, 1 warning (7.94 sec)
mysql> SELECT COUNT(1) FROM test_no WHERE varchar_num='0000194701461220130201115347';
+----------+
| COUNT(1) |
+----------+
| 1 |
+----------+
1 row in set (0.00 sec)
AFAIK it is standard, but it is considered bad practice because
- using it in a WHERE clause will prevent the optimizer from using indices (explain plan should show that)
- the database has to do additional work to convert the string to a number
- if you're using this for floating-point numbers ('9.4'), you'll run into trouble if client and server use different language settings (9.4 vs 9,4)
In short: don't do it (but YMMV)
This is not standard behavior.
For MySQL 5.5. this is the default SQL Mode
mysql> select ##sql_mode;
+------------+
| ##sql_mode |
+------------+
| |
+------------+
1 row in set (0.00 sec)
ANSI and TRADITIONAL are used more rigorously by Oracle and PostgreSQL. The SQL Modes MySQL permits must be set IF AND ONLY IF you want to make the SQL more ANSI-compliant. Otherwise, you don't have to touch a thing. I've never done so.
It depends on the column type!
if you run
SELECT * FROM `users` WHERE `username` = 0;
in mysql/maria-db you will get all the records where username IS NOT NULL.
Always quote values if the column is of type string (char, varchar,...) otherwise you'll get unexpected results!
You don't need to quote the numbers but it is always a good habit if you do as it is consistent.
The issue is, let's say that we have a table called users, which has a column called current_balance of type FLOAT, if you run this query:
UPDATE `users` SET `current_balance`='231608.09' WHERE `user_id`=9;
The current_balance field will be updated to 231608, because MySQL made a rounding, similarly if you try this query:
UPDATE `users` SET `current_balance`='231608.55' WHERE `user_id`=9;
The current_balance field will be updated to 231609
Problem: I have a table containing "keywords" associated to a "label".
I have to find the label associated to an input string thanks to a query.
Example:
DB (table):
keywords| label | weight
PLOP | ploplabel | 12
PLOP | ploplbl | 8
TOTO | totolabel | 4
... | ... | ...
Input string : "PLOP 123"
Should return: "ploplabel"
The first instinctive query in my mind, for partial keywords reasearch, was:
SELECT label FROM table WHERE keywords LIKE "%inputstring%" ORDER BY weight DESC
But as you may have seen, it is the opposite I would need, something like:
SELECT label FROM table WHERE %keywords% LIKE "inputstring" ORDER BY weight DESC
Is it something we can do in MySQL (innoDB === no fulltext)?
Build your system using innodb and create a myisam fulltext table to index back into your innodb data. That way you get all the advantages of the innodb engine: clustered primary key indexes, row level locking and transactions supplemented by the fulltext capabilities of one or more myisam tables.
Any way to achieve fulltext-like search on InnoDB